Full Text

Turn on search term navigation

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Deepfakes are becoming increasingly ubiquitous, particularly in facial manipulation. Numerous researchers and companies have released multiple datasets of face deepfakes labeled to indicate different methods of forgery. However, naming these labels is often arbitrary and inconsistent, leading to the fact that most researchers now choose to use only one of the datasets for research work. However, researchers must use these datasets in practical applications and conduct traceability research. In this study, we employ some models to extract forgery features from various deepfake datasets and utilize the K-means clustering method to identify datasets with similar feature values. We analyze the feature values using the Calinski Harabasz Index method. Our findings reveal that datasets with the same or similar labels in different deepfake datasets exhibit different forgery features. We proposed the KCE system to solve this problem, which combines multiple deepfake datasets according to feature similarity. We analyzed four groups of test datasets and found that the model trained based on KCE combined data faced unknown data types, and Calinski Harabasz scored 42.3% higher than combined by forged names. Furthermore, it is 2.5% higher than the model using all data, although the latter has more training data. It shows that this method improves the generalization ability of the model. This paper introduces a fresh perspective for effectively evaluating and utilizing diverse deepfake datasets and conducting deepfake traceability research.

Details

Title
The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets
Author
Sun, Yi 1   VIAFID ORCID Logo  ; Zheng, Jun 2 ; Lingjuan Lyn 3 ; Zhao, Hanyu 2 ; Li, Jiaxing 2 ; Tan, Yunteng 2 ; Liu, Xinyu 2 ; Li, Yuanzhang 2   VIAFID ORCID Logo 

 Beijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing 100811, China; [email protected] (Y.S.); [email protected] (J.Z.); [email protected] (H.Z.); [email protected] (J.L.); [email protected] (Y.T.); [email protected] (X.L.); Department of Information Systems Technology and Design, Singapore University of Technology and Design, 8 Somapah Road, Singapore 487372, Singapore 
 Beijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing 100811, China; [email protected] (Y.S.); [email protected] (J.Z.); [email protected] (H.Z.); [email protected] (J.L.); [email protected] (Y.T.); [email protected] (X.L.) 
 Sony AI Inc., 1-7-1 Konan Minato-ku, Tokyo 108-0075, Japan; [email protected] 
First page
2353
Publication year
2023
Publication date
2023
Publisher
MDPI AG
e-ISSN
20799292
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2824008170
Copyright
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.