Deepfake Detection A Systematic Literature Review
Deepfake Detection A Systematic Literature Review
Deepfake Detection A Systematic Literature Review
ABSTRACT Over the last few decades, rapid progress in AI, machine learning, and deep learning has
resulted in new techniques and various tools for manipulating multimedia. Though the technology has been
mostly used in legitimate applications such as for entertainment and education, etc., malicious users have
also exploited them for unlawful or nefarious purposes. For example, high-quality and realistic fake videos,
images, or audios have been created to spread misinformation and propaganda, foment political discord
and hate, or even harass and blackmail people. The manipulated, high-quality and realistic videos have
become known recently as Deepfake. Various approaches have since been described in the literature to deal
with the problems raised by Deepfake. To provide an updated overview of the research works in Deepfake
detection, we conduct a systematic literature review (SLR) in this paper, summarizing 112 relevant articles
from 2018 to 2020 that presented a variety of methodologies. We analyze them by grouping them into four
different categories: deep learning-based techniques, classical machine learning-based methods, statistical
techniques, and blockchain-based techniques. We also evaluate the performance of the detection capability
of the various methods with respect to different datasets and conclude that the deep learning-based methods
outperform other methods in Deepfake detection.
INDEX TERMS Deepfake detection, video or image manipulation, digital media forensics, systematic
literature review.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
25494 VOLUME 10, 2022
M. S. Rana et al.: Deepfake Detection: Systematic Literature Review
FIGURE 4. The information of the extracted data. FIGURE 5. Distribution of studies (half-yearly).
mainly manipulates images or video using deep learning (DL) applied to detect specific artifacts generated by their gener-
based technique, other methods along with DL obtain Deep- ation pipeline. Zhang et al. [33] introduced a GAN simula-
fake. We categorize different researches according to the tor that replicates collective GAN-image artifacts and feeds
applied techniques and describe them in the following them as input to a classifier to identify them as Deepfake.
sections. Zhou et al. [34] proposed a network for extracting the stan-
dard features from RGB data, while [35] proposed a simi-
1) MACHINE LEARNING BASED METHODS lar but generic resolution. Besides, in [36]–[38], researchers
Traditional machine learning (ML) algorithms are instrumen- proposed a new detection framework based on physiological
tal in comprehending the logic for any decision that could be measurement, for example, Heartbeat.
expressed in human terms. Such methods are suitable for the At first, the deep learning-based method was proposed
Deepfake domain as there is a better grasp of the data and in [40] for Deepfake video detection. Two inception modules,
processes. In addition, tuning hyper-parameters and changing (i) Meso-4 and (ii) MesoInception-4, were used to build
model designs are much more manageable. The tree-based their proposed network. In this technique, the mean squared
ML approaches, for example, Decision Tree, Random Forest, error (MSE) between the actual and expected labels is used
Extremely Randomized Trees, etc., show the decision process as the loss function for training. An enhancement of Meso-4
in the form of a tree. Therefore, a tree-based method does not has been proposed in [41].
have any explainability issues. In a supervised scenario, the authors in [42] shows that
GANs are used to automatically train a generative model the deep CNNs [43]–[45] outperform shallow CNNs. Some
by treating the unsupervised issue as supervised and cre- methods apply techniques for extracting the handcrafted fea-
ating photo-realistic fake faces in images or videos. Some tures [46]–[47], spatiotemporal features [48]–[51], common
ML-based methods aspire to show certain irregularities found textures [52], [53], 68 face landmarks [54]–[56] with visual
in such GANs generated fake videos or images. artifacts (i.e., eye, teeth, lip movement, etc.) from the video
A very fundamental approach of Deepfake is to manipulate frames. Such features were used as input to the these networks
the human face to confuse its audiences. There are different for detecting Deepfake manipulations. Besides data augmen-
approaches to do that. However, to fool the users, most tation [57], super-resolution reconstruction [58], localization
techniques modify certain regions of the face, such as shade strategies in pixel levels [11] are formulated on the entire
of the eyes, ear with a ring, etc. Such methods using a single frame, and maximum mean discrepancy (MMD) loss [59] is
part (a.k.a. feature) are limited to identifying or detecting applied to discover a more general feature.
the manipulated area. To overcome these, the authors in [21] Further innovations are achieved by introducing an atten-
proposed a Deepfake technique by combining a set of such tion mechanism [61] while promising outcomes are shown
features. in [62]–[63] by using an architecture named capsule-network
In [22], the consistency of the biological signs are mea- (CN). The CN needs a smaller number of parameters to
sured along with the spatial and temporal [23]–[25] directions train than very deep networks. An ensemble learning tech-
to use various landmark [26] points of the face (e.g., eyes, nique [64]–[65] is applied to increase such structures’ perfor-
nose, mouth, etc.) as unique features for authenticating the mance, which achieves more than 99% accuracy.
legitimacy of GANs generated videos or images. Similar We observe that many approaches were proposed to apply
characteristics are also visible in Deepfake videos, which frame-by-frame analysis in videos or images to manipulate
can be discovered by approximating the 3D head pose [27]. face and track facial movement to obtain better performance.
In most cases, facial expressions are associated initially with For example, in [66]–[71], RNN based networks are proposed
the head’s movements. Habeeba et al. [88] applied MLP to to extract the features at various micro and macroscopic
detect Deepfake video with very little computing power by levels for detecting Deepfake. Regardless of these exciting
exploiting visual artifacts in the face region. results in detection, it is seen that most of the methods lean
As far as the performance concern in machine learning- towards overfitting. The optical fow based technique [72] and
based Deepfake methods, it is observed that these approaches autoencoder-based architectures [73]–[76] are introduced to
can achieve up to 98% accuracy in detecting Deepfakes. How- resolve such problems. A pixel-wise mask [77] is imposed
ever, the performance entirely relies on the type of dataset, the on various models to get the essential depiction of the face’s
selected features, and the alignment between the train and test affected area. Fernando et al. [78] applied adversarial train-
sets. The study can obtain a higher result when the experiment ing approaches followed by attention-based mechanisms for
uses a similar dataset by splitting it into a certain level of ratio, concealed facial manipulations. In [93], researchers pro-
for example, 80% for a train set and 20% for a test set. The posed a clustering technique by integrating a margin-based
unrelated dataset drops the performance close to 50%, which triplet embedding regularization term in their classification
is an arbitrary assumption. loss function. Finally, they converted the three-class classi-
fication problem to a two-class classification problem. The
2) DEEP LEARNING BASED METHODS authors in [94]–[95] proposed a data pre-processing tech-
In the case of Deepfake detection in images, there are nique for detecting Deepfakes by applying CNN methods.
plenty of works where deep learning-based methods are The researchers in [96] proposed patch and pair convolutional
neural networks (PPCNN). In [97], authors performed distance increases iff the GAN provides a lesser amount of
an analysis in the frequency domain by exploiting the correctness. Besides, an extremely precise GAN is mandatory
image latent patterns’ richness. A modern approach called to create high-resolution manipulated images that are harder
ID-revelation [98] was proposed to learn temporal facial fea- to detect.
tures based on a person’s movement during talking. A novel
feature extraction method [99] had been proposed for effec- 4) BLOCKCHAIN BASED METHODS
tively classifying Deepfake images. In [100], a multimodal Blockchain technology provides various features that can
approach was proposed for detecting real and Deepfake verify the legitimacy and provenance of digital content in a
videos. This method extracts and analyzes the similarities highly trusted, secured, and decentralized manner. In public
between the audio and visual modalities within the same Blockchain technology, anyone has direct access to every
video. In [101], a Deepfake detection method is applied to transaction, log, and tamper-proof record. For Deepfake
find the discrepancies between faces and their context by detection, public Blockchain is considered one of the most
combining multiple XeptionNet models. appropriate technological solutions for verifying video’s or
In [101], a separable convolutional network is used for image’s genuineness in a decentralized way. Users usually
detecting such manipulations. [103] resorts to the feature need to explore the origin of videos or images when they are
extraction process’s triplet loss function to better classify marked as suspected.
fake faces. A patch-based classifier was introduced in [104] Hasan and Salah [113] proposed a Blockchain-based
to focus on local patches rather than the global structure. generic framework to track suspected video’s origin to their
In [105]–[106], the authors extracted features using improved sources. The proposed solution can trace its transaction
VGG networks. A hypothesis test was performed in [107]. records, even though the material is copied several times.
The basic principle says that digital content is considered
3) STATISTICAL MEASUREMENTS BASED METHODS authentic when convincingly traced to a reliable source. For
Determining different statistical measures such as average Deepfakes, public Blockchain verifies video content’s legit-
normalized cross-correlation scores between original and imacy in a decentralized way, as the technology can provide
suspected data helps to understand the originality of the some critical features to prove its authenticity. The following
data. Koopman et al. [108] examined the photo response non- are the main contributions of [113].
uniformity (PRNU) for detecting Deepfakes in video frames. • Presents a generic framework based on Blockchain tech-
PRNU is a unique noise pattern in the digital images that nology by setting up a proof of digital content’s authen-
occurred due to the defects in the camera’s light-sensitive ticity to its trusted source.
sensors. Because of its distinctiveness, it is also considered • Presents the proposed solution’s architecture and design
the fingerprint of digital photos. The research generates a details to control and administrate the interactions and
sequence of frames from input videos and stores them in transactions among participants.
chronologically categorized directories. Each video frame is • Integrates the critical features of IPFS [114]-based
clipped with the same pixel range to preserve and clarify decentralized storage ability to Blockchain-based
the portion of the PRNU sequence. These frames are then Ethereum Name service.
divided into eight equal groups. It then makes the standard
Chan et al. [115] proposed a decentralized approach based
PRNU pattern for each frame using the second-order FSTV
on Blockchain to trace and track digital content’s histori-
method [147]. After that, it correlates them by measuring the
cal provenance (i.e., image, videos, etc.). In this proposed
normalized cross-correlation scores and calculating the dif-
approach, multiple LSTM networks are being used as a deep
ferences between the correlation scores and the mean correla-
encoder for creating discriminating features, which are then
tion score for each frame. To evaluate statistical significance
compressed and used to hash the transaction. The main con-
between Deepfakes and original videos, the authors conduct
tributions of this paper are as follows.
a t-test [109] on the results.
To model a basic generating convolutional structure, the • Using multiple LSTM CNN models, image/video con-
authors in [110] extracted a collection of regional features tents are hashed and encoded.
using the Expectation-Maximization (EM) algorithm. After • High dimensional features are preserved as a binary
the extraction, they apply ad-hoc validation to those archi- coded structure.
tectures, such as GDWCT, STARGAN, ATTGAN, STYLE- • The information is stored in a permission-based
GAN, and STYLEGAN2, using preliminary experiments Blockchain, which gives the owner control over its
naive classifiers. Agarwal et al. [111] performed a hypothesis contents.
test by proposing a statistical framework [112] for detecting Based on the studies, taking together all these methods,
the Deepfakes. Firstly, this method defines the shortest path Table 3 lists the categories of Deepfake detection strategies
between distributions of original and GAN-created images. and displays the quantity (No.) and percentage (PCT) of
Based on the results of this hypothesis, this distance measures related categories of studies. This table includes 91 stud-
the detection capability. For example, Deepfakes can easily ies, except 21 different reviews ([60], [116]–[135]) which
be detected when this distance is increased. Usually, the merge various methods. Also, this table reveals that the
TABLE 3. Classification of Deepfake detection methods. TABLE 4. The list of Deepfake datasets.
FIGURE 7. List of datasets used in Deepfake related studies. Convolutional Neural Network (RCNN) model, Faster
RCNN model, Hierarchical Memory Network (HMN)
model, Multi-task Cascaded CNNs MTCNN) model and
special artifacts-based features generated by various edit- Deep Ensemble Learning (DEL).
ing processes. Among them, 20 studies use texture and • Machine learning model: This technique creates a fea-
Spatio-temporal consistent features, 14 studies involve facial ture vector by defining the right features using var-
landmarks-based features. Also, 13 research papers perform ious state-of-art feature selection algorithms. It then
experiments using visual artifacts-based elements, for exam- feeds this vector as input to train a classifier to classify
ple, eye blinking, head posing, lip movement, etc. Eight whether the videos or images are manipulated by Deep-
pieces of work apply biological characteristics, whereas fake or not. Support Vector Machine (SVM), Logistic
seven studies concern intra-frame inconsistencies with fre- Regression (LR), Multilayer Perceptron (MLP), Adap-
quency domain analysis. In addition, six studies use GAN- tive Boosting (AdaBoost), eXtreme Gradient Boosting
based features, and four studies cover latent space-based (XGBoost), and K-Means clustering (k-MN), Random
features. Ten studies use custom features utilizing various Forest (RF), Decision Tree (DT), Discriminant Analysis
analyses that include error level analysis, mesoscopic anal- (DA), Naive Bayes (NB) and Multiple Instance Learn-
ysis, steganalysis, super-resolution, augmentation, maximum ing (MIL) are used as machine learning-based models.
mean discrepancy, PRNU pattern analysis, etc. The details are • Statistical Model; The statistical models are based on
described in the Result section using RQ-1. The study shows the use of the information-theoretic study for valida-
that special artifacts-based features, face landmarks, and tion. In these models, the shortest paths are calculated
Spatio-temporal features are used widely to detect Deepfakes. between original and Deepfake videos/images distribu-
tions. For example, in [108], a significance is measured
3) SRQ-2.3: WHAT MODELS ARE USED TO DETECT for mean normalized cross-correlation scores between
DEEPFAKE MANIPULATION? the original and the Deepfake videos, classifying them
This segment describes various models that are used as fake or real. The often-applied statistical models are
for detecting Deepfake. Based on this study, we divide Expectation-Maximization (EM), Total Variational (TV)
these models into three groups: (i) deep learning model, distance, Kullback-Leibler (KL) divergence, Jensen-
(ii) machine learning model, and (iii) statistical model. Shannon (JS) divergence, etc.
• Deep Learning models: In computer vision, deep Based on these studies, we conduct a categorization in the
learning models have been used widely due to deep learning models, machine learning models, and statis-
their feature extraction and selection mechanism, tical methods, as shown in Table 5. The table outlines the
as they can directly extract or learn features from number and the percentage of models used in the studies,
the data. In Deepfake detection studies, we found except for 21 different reviews. Also, we observe that the
the following deep learning-based models have been DL-based studies hold the highest proportion of SLR.
used: convolutional neural network (CNN) model Figure 8 displays the full versions of detector groups that
(e.g., XceptionNet, GoogleNet, VGG, ResNet, Efficent- are found from these primary studies, where CNN has the
Net, HRNet, InceptionResNetV2, MobileNet, Incep- most divisions. Based on this Table 5, we further apply a
tionV3, DenseNet, SuppressNet, StatsNet), Recurrent subcategorization on CNN models and found that the fol-
Neural Network (RNN) model (e.g., LSTM, FaceNet), lowing 3 CNN models: (i) XeptionNet and ResNet take 17%
Bidirectional RNN model, Long-term Recurrent and (iii) VGG with 12%, respectively. Besides, LSTM models
take 13% of RNN. In addition to this, the most popular Convolutional Neural Network, MTCNN: Multi-task Cas-
machine learning model is SVM with 12% and k-MN with caded CNN, MSCNN: multi-scale Temporal CNN), ML:
4%. The detail distribution in various models is presented in (SVM: Support Vector Machine, RF: Random Forest, MLP:
Figure 9 that shows the proportion of used models (e.g., DL, Multilayer Perceptron Neural Network, LR: Logistic Regres-
ML, Statistical) in various studies for detecting Deepfake. sion, k-MN: K means clustering, XGB: XGBoost, ADB:
Besides, it provides the answer for SRQ-2.3. The reviewed AdaBoost, DT: Decision Tree, NB: Naive Bayes, KNN:
papers show that the deep neural network (DNN) models K-Nearest Neighbour, DA: Discriminant Analysis), STAT:
are successful in Deepfake detection, where CNN-based (EM: Expectation Maximization, CRA: Co-relation Analy-
models demonstrate more efficiency among all the sis), BC: (ETH: Ethereum Blockchain)), Features
DNN models. (SA: Special Artifacts, VA: Visual Artifacts, BA: Biolog-
At a glance. Focus indicates the clue for the detection ical Artifacts, FL: Face Landmarks, STC: Spatio-temporal
(DMF: Digital Media Forensics, FM: Face Manipulation, Consistency, TEX: Texture, FDA: Frequency Domain Anal-
Both: DMF and FM), Methods indicates method cate- ysis, LS: Latent Feature, GAN: Generative Adversarial
gory (ML: Machine Learning, DL: Deep Learning, STAT: Network based feature, MES: Mesoscopic features, IFIC:
Statistical method, BC: Blockchain), Models represents Intra-frame inconsistency, CPRNU: Constrastive and Pho-
types of model (DL: (CNN: Convolutional Neural Net- toresponsive PRNU pattern, IMG: Image Metadata, Aug-
work, RNN: Recurrent Neural Network, RCNN: Regional mentation & Steganalysis, Other: Different feature not in
TABLE 7. Confusion matrix. approaches concerning different elements such as input data,
features, method categories, and type of techniques. A path
between two elements denotes the related components used
in the companion paper for any method. As presented in
the Figure, most papers apply image or video as the input
data, whereas many papers use both image and video as the
input. Special Artifacts and Texture and Spatio-temporal Con-
sistency are the commonly used features in various papers.
the common list), Datasets (FF: FaceForensics, FF++:
About 75% of the methods used the DL-based techniques
FaceForensics++, DFD: Deepfake Detection, CELEB-A:
as the detection method category. Only a few papers used
DeepFake Forensics V1, CELEB-DF: DeepFake Forensics
Blockchain and Statistical approaches for detecting such
V2, DFDC: Deepfake Detection Challange, DF-TIMIT:
Deepfake.
Deepfake-TIMIT, DF-1.0: DeeperForensics-1.0, WDF: Wild
In detecting Deepfake, various underlying techniques are
Deepfake, SMFW: SwapMe and FaceSwap, DFS: Deep
available, such as Biological Signals, Phoneme-Viseme Mis-
Fakes, FFD: Fake Faces in the Wild, FE: FakeET, FS:
matches, facial expression and movements (i.e., 2D and
Face Shifter, DF: Deepfake, SFD: Swapped Face Detection,
3D facial landmark positions, head pose, and facial action
UADFV: Inconsistent Head Poses, MANFA: Tampered Face,
units), etc. We combine them under two central umbrel-
Other: Authors’ Custom datasets).
las of the methods that include Facial Manipulation and
Finally, we summarize all at a glance using Table 6 that
Digital Media Forensics. As shown in Figure 10, most
specifies the features, methods and models, datasets used
of the DL-based methods exploit Facial Manipulation for
throughout the studies and also focuses on specific manip-
the Deepfake detection. However, Machine Learning based
ulation detection techniques with having a reference to each
methods almost equally utilize both techniques. Common
of the primary studies.
to both Blockchain and Statistical approaches, they apply
only Digital Media Forensics as part of the detection
4) SRQ-2.4: WHAT MEASUREMENT METRICS ARE USED technique.
FOR COMPUTING THE PERFORMANCE OF DEEPFAKE
DETECTION METHODS?
This section briefly describes various measurement metrics E. RQ-4: WHAT IS THE GENERAL EFFICIENCY OF
applied for assessing the models’ performance in detect- A VARIETY OF DEEPFAKE DETECTION STRATEGIES
ing such Deepfakes. A confusion matrix holds info about BASED ON EXPERIMENTAL PROOF?
actual and predicted classification results. The accounts of This segment attempts to decide the efficacy of Deepfake
the detection capabilities of the used methods are measured detection methods. The output assessment values are first
and confirmed using this matrix data. Table 7 describes the obtained and stored in an Excel document based on the
confusion matrix. studies. After that, we count the number of studies that use the
Using Table 7, we can define the term, TP, which pro- same method and the same measurement metrics (precision,
vides the number of Deepfakes that are correctly predicted as accuracy, and recall). And finally, we apply four metrics: the
Deepfake, and TN offers the number of actual images/videos minimum, maximum, mean, and standard deviation on these
correctly predicted as real. Besides, FP stands for the num- values (see Table 9).
ber of real images/videos incorrectly predicted as Deepfake, In Table 9, based on the mean values of accuracy and
where FN is the number of Deepfakes incorrectly predicted AUC, deep learning-based methods outperform other meth-
as the real. Similarly, using Table 8, we can define various ods and achieve 89.73% and 0.917, respectively. Besides,
measurement metrics and show how many studies are related we also compare the recall and precision values for
to these metrics. both techniques. Based on the overall results, we found
Based on the Table 8, it is seen that the often-applied deep learning-based techniques are efficient for detecting
measurement metrics are Accuracy (AC), receiver operating Deepfake.
characteristic (ROC) curve, and area under the ROC curve
(AUC). Recall, error rate (ER), precision (P), f1-score, and F. RQ-5: IS THE EFFICIENCY OF DEEP LEARNING MODELS
log loss occupy a similar proportion. The least used perfor- BETTER THAN NON-DEEP LEARNING MODELS
mance measure is frechet-inception-distance (FID). Based on IN DEEPFAKE DETECTION BASED
the study, accuracy and AUC are widely used measurement ON EXPERIMENTAL RESULTS?
metrics in detecting Deepfake. We split the models into two groups: (i) deep learning-based
models and (ii) non-deep learning-based models. We deter-
D. RQ-3: WHAT IS THE CLASSIFICATION FRAMEWORK mine the mean accuracy, AUC, recall, and precision.
FOR DEEPFAKE DETECTION APPROACHES? Next, we apply a comparative analysis of these two
For better insights, we summarize our key findings models’ performance and obtain an average result. Based
in Figure 10. As demonstrated in Figure, we classify overall on the evaluation of these models using performance
measures (accuracy, AUC, recall, and precision), we observe, especially the CNN models, to learn how to mechanically or
in general, deep learning-based models outperformed non- directly learn perceptible and selective features to identify
deep learning-based models. As the results are reported in such Deepfake. For example, Ding et al. [82] introduced a
Figure 11, the accuracy and precision performance in deep two-phase CNN method for Deepfake detection. The first
learning models are significantly better than non-deep learn- stage extracts particular features among counterfeit and actual
ing models. However, in the case of AUC and recall, the images by incorporating various dense units, where each of
performance is pretty similar. The overall results demonstrate them includes a list of dense blocks that are forged images.
the superiority of deep learning-based models over non-deep The second phase uses these features to train the proposed
learning-based models. CNN to classify the input images, whether fake or real.
Due to the typical use of lossy compression in video com-
IV. OBSERVATIONS pression, most detection techniques used in an image are
A. COMBINING DIFFERENT DEEP LEARNING METHODS not suitable for videos, as these methods degrade the frame
IS CRITICAL FOR THE ACCURATE dEEPFAKE DETECTION data. Because videos have temporal features and vary the
Based on the review, we see that multiple strategies are frames’ size, it is challenging for techniques to distinguish
applied using numerous features. In general, primary meth- just counterfeit images. In [68], a recurrent convolutional
ods used handcrafted features collected from face artifacts. model (RCN) was proposed to use these spatiotemporal fea-
Recent research applied deep learning-based approaches, tures [48]–[51] of videos for detecting Deepfakes. Likewise,
FIGURE 9. The allocation of subcategories of detection models. ML: Machine Learning; DL: Deep Learning.
Guera and Delp [66] discovered intra-frame and temporal videos is always lower than in real videos. It can easily extract
inconsistencies among the Deepfake videos’ frames. They from the eye areas based on six eye landmarks and use them
proposed a network composed of CNN and LSTM to detect as features.
such discrepancies in Deepfakes. In this architecture, CNN On the other hand, Rana and Sung [65] proposed a deep
handles extracting the frame-level features and LSTM to use ensemble learning strategy, namely, DeepfakeStack, to detect
these features as input to generate a descriptor accountable Deepfake by analyzing multiple deep learning models. The
for analyzing the temporal sequence. Besides, to use physical concept behind DeepfakeStack is to train a meta-learner to
indications [35]–[36], for example, eye blinking as features top base-learners with pre-trained experience. It provides
in detecting Deepfake, Li, et al. [46] proposed a long-term an interface for fitting the meta-learners on the base learn-
recurrent convolutional network (LRCN). Their method high- ers’ prediction and demonstrates how an ensemble method
lighted that the total eye blinking of an individual in Deepfake executes the role of classification. The DeepfakeStack
FIGURE 10. Taxonomy of Deepfake detection techniques. This taxonomy classifies the detection algorithms according to the media (image, video,
or image and video), the features used (among the 12 features), the detection method (DL, ML, Blockchain, or statistical), and the clue for the detection
(facial manipulation of digital media forensics, or other indications). The size of the connection line reflects the relative count of papers.
architecture includes multiple base-learners, the level-0 TABLE 9. Performance of various detection methods.
model, and a meta-learner, a level-1 model. The experi-
ment reveals the DeepfakeStack achieves 99.65% accuracy
and 1.0 of AUROC.
To some extent, these deep learning methods are com-
plementary. In practice, combining multiple deep learning
methods could obtain improved results compared to a sin-
gle process. For example, the DeepfakeStack [65] integrates
multiple state-of-the-art classification algorithms focused on
deep learning and produces a sophisticated composite clas-
sifier that achieves 99.65%. Based on the RQ-1, it is seen
that a maximum number of studies have applied deep learn-
ing techniques for detecting Deepfake. Therefore, it may Deepfake from SRQ-2.3 has become a hot subject. We also
be appropriate to explore the compatibility of deep learning find that most studies follow a traditional CNN approach
methods and integrate some of them for further progress in to classify Deepfake in the deep-learning environment. Still,
Deepfake detection. researchers have not yet figured out how to determine Deep-
fake authorship.
Based on the outcomes of RQ-4, it is observed that the deep
B. DEEP LEARNING-BASED METHODS ARE learning-based models achieve better performance than the
RECOMMENDED IN DEEPFAKE DETECTION non-deep learning models in Deepfake detection. Therefore,
Compared with the traditional machine learning approaches, deep learning-based approaches are advised when detecting
we note that applying deep learning algorithms to detect Deepfake.
C. EXTERNAL VALIDITY
It is about the summary of the results obtained from various
studies that we considered. To improve the quality of the
findings in RQ-3 and RQ-4 in future studies, we recommend
setting up a unique framework to reduce the inconsistencies
FIGURE 11. The comparison of the results among deep learning and in the results reported. Besides, more Deepfake detection
non-deep learning based models. experiments might be required to be obtained to produce
definitive and systematic outcomes.
C. A UNIQUE FRAMEWORK IS REQUIRED FOR THE FAIR
EVALUATION OF DIFFERENT HETEROGENEOUS VI. CONCLUSION
DEEPFAKE DETECTION METHODS This SLR presents various state-of-the-art methods for detect-
After reviewing the studies listed above, we note that several ing Deepfake published in 112 studies from the beginning
studies have used other datasets. Secondly, there are also dif- of 2018 to the end of 2020. We present basic techniques
ferences with specific experiments that use the same dataset. and discuss different detection models’ efficacy in this work.
(1) The measurement metrics used in the studies in question We summarize the overall study as follows:
are not standard. For example, some experiments evaluate the • The deep learning-based methods are widely used in
performance of detection tasks using Accuracy and AUROC. detecting Deepfake.
Some studies use precision and recall only; (2) In these • In the experiments, the FF++ dataset occupies the largest
studies, it is also seen that the dataset’s size is not consistent. proportion.
For example, the FF++ dataset has 1000 Deepfake videos, • The deep learning (mainly CNN) models hold a signifi-
but a few studies use the entire dataset while others use half. cant percentage of all the models.
Some studies use only 100 videos; (3) The initial videos in • The most widely used performance metric is detection
these experiments are hardly available in public. The above accuracy.
conditions may lessen the trustworthiness of these RQ-3 and • The experimental results demonstrate that deep learning
RQ-4 findings. techniques are effective in detecting Deepfake. Further,
Based on the above circumstances, this section concludes it can be stated that, in general, the deep learning models
that creating a unique framework for the fair assessment of outperform the non-deep learning models.
the performance is essential.
With the rapid progress in underlying multimedia tech-
nology and the proliferation of tools and applications,
V. LIMITATIONS AND CHALLENGES
Deepfake detection still faces many challenges. We hope
This section will discuss some limitations and challenges that
this SLR provides a valuable resource for the research
we observed during the preparation of this SLR.
community in developing effective detection methods and
countermeasures.
A. CONSTRUCT VALIDITY
It is related to the collection of studies. We compile the asso- REFERENCES
ciated articles from journals, seminars, conferences, work- [1] FaceApp. Accessed: Jan. 4, 2021. [Online]. Available: https://www.
shops, and archives of many electronic libraries in this faceapp.com/
SLR. It is still possible that some of the related papers [2] FakeApp. Accessed: Jan. 4, 2021. [Online]. Available: https://www.
fakeapp.org/
might still be missing from our collection of studies. Fur- [3] G. Oberoi. Exploring DeepFakes. Accessed: Jan. 4, 2021. [Online].
ther, we might have a few mistakes sorting these exper- Available: https://goberoi.com/exploring-deepfakes-20c9947c22d9
iments through the selection or rejection parameters we [4] J. Hui. How Deep Learning Fakes Videos (Deepfake) and How to
Detect it. Accessed: Jan. 4, 2021. [Online]. Available: https://medium.
used in the process. We evaluated our catalog of stud- com/how-deep-learning-fakes-videos-deepfakes-and-how-to-detect-it-
ies using a double-checking approach to address such c0b50fbf7cb9
errors. [5] I. Goodfellow, J. P. Abadie, M. Mirza, B. Xu, D. W. Farley, S. Ozair,
A. Courville, and Y. Bengio, ‘‘Generative adversarial nets,’’ in Proc. 27th
Int. Conf. Neural Inf. Process. Syst. (NIPS), vol. 2. Cambridge, MA, USA:
B. INTERNAL VALIDITY MIT Press, 2014, pp. 2672–2680.
The internal validity is related to data extraction and anal- [6] G. Patrini, F. Cavalli, and H. Ajder, ‘‘The state of deepfakes:
Reality under attack,’’ Deeptrace B.V., Amsterdam, The Nether-
ysis. The present work involved an intense workload of lands, Annu. Rep. v.2.3., 2018. [Online]. Available: https://s3.eu-west-
data extraction and data processing. The cross-checking pro- 2.amazonaws.com/rep2018/2018-the-state-of-deepfakes.pdf
[7] J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Niessner, [29] R. Durall, M. Keuper, F.-J. Pfreundt, and J. Keuper, ‘‘Unmasking Deep-
‘‘Face2Face: Real-time face capture and reenactment of RGB videos,’’ Fakes with simple features,’’ 2019, arXiv:1911.00686.
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, [30] P. Zhou, X. Han, V. I. Morariu, and L. S. Davis, ‘‘Two-stream neural
NV, USA, Jun. 2016, pp. 2387–2395, doi: 10.1109/CVPR.2016.262. networks for tampered face detection,’’ in Proc. IEEE Conf. Comput. Vis.
[8] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, ‘‘Unpaired image-to-image Pattern Recognit. Workshops (CVPRW), Honolulu, HI, USA, Jul. 2017,
translation using cycle-consistent adversarial networks,’’ in Proc. IEEE pp. 1831–1839, doi: 10.1109/CVPRW.2017.229.
Int. Conf. Comput. Vis. (ICCV), Venice, Oct. 2017, pp. 2242–2251, doi: [31] K. Songsri-in and S. Zafeiriou, ‘‘Complement face forensic detection and
10.1109/ICCV.2017.244. localization with faciallandmarks,’’ 2019, arXiv:1910.05455.
[9] S. Suwajanakorn, S. M. Seitz, and I. K. Shlizerman, ‘‘Synthesizing [32] A. Kumar and A. Bhavsar, ‘‘Detecting deepfakes with metric learning,’’
Obama: Learning lip sync from audio,’’ ACM Trans. Graph., vol. 36, 2020, arXiv:2003.08645.
no. 4, p. 95, 2017. [33] X. Zhang, S. Karaman, and S.-F. Chang, ‘‘Detecting and simulating
[10] L. Matsakis. Artificial Intelligence is Now Fighting Fake vid. Accessed: artifacts in GAN fake images,’’ in Proc. IEEE Int. Workshop Inf. Forensics
Jan. 4, 2021. [Online]. Available: https://www.wired.com/story/gfycat- Secur. (WIFS), Dec. 2019, pp. 1–6.
artificial-intelligence-deepfakes/ [34] P. Zhou, X. Han, V. I. Morariu, and L. S. Davis, ‘‘Learning rich fea-
[11] A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, tures for image manipulation detection,’’ in Proc. IEEE/CVF Conf.
‘‘FaceForensics: A large-scale video dataset for forgery detection in Comput. Vis. Pattern Recognit., Salt Lake City, UT, USA, Jun. 2018,
human faces,’’ 2018, arXiv:1803.09179. pp. 1053–1061, doi: 10.1109/CVPR.2018.00116.
[12] H. Kim, P. Garrido, A. Tewari, W. Xu, J. Thies, M. Niessner, [35] K. Chugh, P. Gupta, A. Dhall, and R. Subramanian, ‘‘Not made for each
P. Pérez, C. Richardt, M. Zollhöfer, and C. Theobalt, ‘‘Deep video por- other- audio-visual dissonance-based deepfake detection and localiza-
traits,’’ ACM Trans. Graph., vol. 37, no. 4, pp. 1–14, Aug. 2018, doi: tion,’’ 2020, arXiv:2005.14405.
10.1145/3197517.3201283. [36] H. Qi, Q. Guo, F. Juefei-Xu, X. Xie, L. Ma, W. Feng, Y. Liu, and
[13] C. Chan, S. Ginosar, T. Zhou, and A. A. Efros, ‘‘Everybody dance now,’’ J. Zhao, ‘‘DeepRhythm: Exposing deepfakes with attentional visual heart-
2018, arXiv:1808.07371. beat rhythms,’’ 2020, arXiv:2006.07634.
[14] T. Karras, S. Laine, and T. Aila, ‘‘A style-based generator architecture [37] S. Fernandes, S. Raj, E. Ortiz, I. Vintila, M. Salter, G. Urosevic, and
for generative adversarial networks,’’ in Proc. IEEE/CVF Conf. Com- S. Jha, ‘‘Predicting heart rate variations of deepfake videos using neural
put. Vis. Pattern Recognit. (CVPR), Long Beach, CA, USA, Jun. 2019, ODE,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshop (ICCVW),
pp. 4396–4405, doi: 10.1109/CVPR.2019.00453. Oct. 2019, pp. 1721–1729.
[15] D. Budgen and P. Brereton, ‘‘Performing systematic literature reviews in [38] J. Hernandez-Ortega, R. Tolosana, J. Fierrez, and A. Morales,
software engineering,’’ in Proc. 28th Int. Conf. Softw. Eng., New York, ‘‘DeepFakesON-phys: DeepFakes detection based on heart rate estima-
NY, USA, May 2006, pp. 1051–1052, doi: 10.1145/1134285.1134500. tion,’’ 2020, arXiv:2010.00400.
[16] Z. Stapic, E. G. Lopez, A. G. Cabot, L. M. Ortega, and V. Strahonja, ‘‘Per- [39] J. Bappy, C. Simons, L. Nataraj, B. Manjunath, and A. R. Chowdhury,
forming systematic literature review in software engineering,’’ in Proc. ‘‘Hybrid LSTM and encoder–decoder architecture for detection of image
23rd Central Eur. Conf. Inf. Intell. Syst. (CECIIS), Varazdin, Croatia, forgeries,’’ IEEE Trans. Image Process., vol. 28, no. 7, pp. 3286–3300,
Sep. 2012, pp. 441–447. Jul. 2019.
[17] B. Kitchenham, ‘‘Procedures for performing systematic reviews,’’ [40] D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, ‘‘MesoNet: A com-
Softw. Eng. Group; Nat. ICT Aust., Keele; Eversleigh, Keele Univ., pact facial video forgery detection network,’’ in Proc. IEEE Int. Workshop
Keele, U.K., Tech. Rep. TR/SE-0401; NICTA Tech. Rep. 0400011T.1, Inf. Forensics Secur. (WIFS), Dec. 2018, pp. 1–7.
2004. [41] P. Kawa and P. Syga, ‘‘A note on deepfake detection with low-resources,’’
[18] B. Kitchenham and S. Charters, ‘‘Guidelines for performing systematic 2020, arXiv:2006.05183.
literature reviews in software engineering,’’ Softw. Eng. Group; Keele [42] A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and
Univ., Durham University Joint, Durham, U.K., Tech. Rep. EBSE-2007- M. Niessner, ‘‘FaceForensics++: Learning to detect manipulated facial
01, 2007. images,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019,
[19] M. A. Babar and H. Zhang, ‘‘Systematic literature reviews in software pp. 1–11.
engineering: Preliminary results from interviews with researchers,’’ in [43] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, ‘‘Densely
Proc. 3rd Int. Symp. Empirical Softw. Eng. Meas., Lake Buena Vista, FL, connected convolutional networks,’’ in Proc. IEEE Conf. Comput. Vis.
USA, Oct. 2009, pp. 346–355, doi: 10.1109/ESEM.2009.5314235. Pattern Recognit. (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 2261–2269,
[20] H. Do, S. Elbaum, and G. Rothermel, ‘‘Supporting controlled experimen- doi: 10.1109/CVPR.2017.243.
tation with testing techniques: An infrastructure and its potential impact,’’ [44] F. Chollet, ‘‘Xception: Deep learning with depthwise separable convo-
Empirical Softw. Eng., vol. 10, no. 4, pp. 405–435, 2005. lutions,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
[21] F. Matern, C. Riess, and M. Stamminger, ‘‘Exploiting visual artifacts Jul. 2017, pp. 1800–1807.
to expose deepfakes and face manipulations,’’ in Proc. IEEE Winter [45] A. Khodabakhsh and C. Busch, ‘‘A generalizable deepfake detector based
Appl. Comput. Vis. Workshops (WACVW), Waikoloa Village, HI, USA, on neural conditional distribution modelling,’’ in Proc. Int. Conf. Biomet-
Jan. 2019, pp. 83–92, doi: 10.1109/WACVW.2019.00020. rics Special Interest Group (BIOSIG), Darmstadt, Germany, Sep. 2020,
[22] U. A. Ciftci, I. Demir, and L. Yin, ‘‘FakeCatcher: Detection of pp. 1–5.
synthetic portrait videos using biological signals,’’ IEEE Trans. [46] Y. Li, M.-C. Chang, and S. Lyu, ‘‘In ictu oculi: Exposing AI created
Pattern Anal. Mach. Intell., early access, Jul. 15, 2020, doi: fake videos by detecting eye blinking,’’ in Proc. IEEE Int. Workshop Inf.
10.1109/TPAMI.2020.3009287. Forensics Secur. (WIFS), Dec. 2018, pp. 1–7.
[23] X. Li, Y. Lang, Y. Chen, X. Mao, Y. He, S. Wang, H. Xue, and Q. Lu, [47] Y. Li and S. Lyu, ‘‘Exposing deepfake videos by detecting face
‘‘Sharp multiple instance learning for deepfake video detection,’’ 2020, warping artifacts,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
arXiv:2008.04585. Recognit. (CVPR) Workshops, 2019, pp. 46–52. [Online]. Available:
[24] L. Guarnera, O. Giudice, and S. Battiato, ‘‘Fighting deepfake by exposing https://openaccess.thecvf.com/content_CVPRW_2019/html/Media_
the convolutional traces on images,’’ 2020, arXiv:2008.04095. Forensics/Li_Exposing_DeepFake_Videos_By_Detecting_Face_
[25] M. Bonomi, C. Pasquini, and G. Boato, ‘‘Dynamic texture analysis for Warping_Artifacts_CVPRW_2019_paper.html
detecting fake faces in video sequences,’’ 2020, arXiv:2007.15271. [48] I. Ganiyusufoglu, L. Minh Ngô, N. Savov, S. Karaoglu, and T. Gevers,
[26] L. Guarnera, O. Giudice, and S. Battiato, ‘‘Fighting deepfake by exposing ‘‘Spatio-temporal features for generalized detection of deepfake videos,’’
the convolutional traces on images,’’ 2020, arXiv:2008.04095. 2020, arXiv:2010.11844.
[27] X. Yang, Y. Li, and S. Lyu, ‘‘Exposing deep fakes using inconsis- [49] A. Singh, A. S. Saimbhi, N. Singh, and M. Mittal, ‘‘Deepfake video
tent head poses,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal detection: A time-distributed approach,’’ SN Comput. Sci., vol. 1, p. 212,
Process. (ICASSP), Brighton, U.K., May 2019, pp. 8261–8265, doi: Jun. 2020, doi: 10.1007/s42979-020-00225-9.
10.1109/ICASSP.2019.8683164. [50] I. Kukanov, J. Karttunen, H. Sillanpää, and V. Hautamäki, ‘‘Cost sensitive
[28] S. Agarwal, H. Farid, Y. Gu, M. He, K. Nagano, and H. Li, ‘‘Protecting optimization of deepfake detector,’’ 2020, arXiv:2012.04199.
world leaders against deep fakes,’’ in Proc. IEEE Conf. Comput. Vis. [51] A. Haliassos, K. Vougioukas, S. Petridis, and M. Pantic, ‘‘Lips don’t lie:
Pattern Recognit. (CVPR) Workshops, Long Beach, CA, USA, Jun. 2019, A generalisable and robust approach to face forgery detection,’’ 2020,
pp. 1–8. arXiv:2012.07657.
[52] X. Zhu, H. Wang, H. Fei, Z. Lei, and S. Z. Li, ‘‘Face forgery detection by [73] D. Cozzolino, J. Thies, A. Rössler, C. Riess, M. Nießner, and L. Verdoliva,
3D decomposition,’’ 2020, arXiv:2011.09737. ‘‘ForensicTransfer: Weakly-supervised domain adaptation for forgery
[53] X. Wang, T. Yao, S. Ding, and L. Ma, ‘‘Face manipulation detection detection,’’ 2018, arXiv:1812.02510.
via auxiliary supervision,’’ in Neural Information Processing (ICONIP) [74] H. H. Nguyen, F. Fang, J. Yamagishi, and I. Echizen, ‘‘Multi-task learning
(Lecture Notes in Computer Science), vol. 12532, H. Yang, K. Pasupa, for detecting and segmenting manipulated facial images and videos,’’
A. C. Leung, J. T. Kwok, J. H. Chan, I. King, Eds. Cham, Switzerland: in Proc. IEEE 10th Int. Conf. Biometrics Theory, Appl. Syst. (BTAS),
Springer, 2020, pp. 313–324, doi: 10.1007/978-3-030-63830-6_27. Sep. 2019, pp. 1–8.
[54] M. T. Jafar, M. Ababneh, M. Al-Zoube, and A. Elhassan, ‘‘Foren- [75] M. Du, S. Pentyala, Y. Li, and X. Hu, ‘‘Towards generalizable deepfake
sics and analysis of deepfake videos,’’ in Proc. 11th Int. Conf. Inf. detection with locality-aware autoencoder,’’ 2019, arXiv:1909.05999.
Commun. Syst. (ICICS), Irbid, Jordan, Apr. 2020, pp. 053–058, doi: [76] L. Trinh, M. Tsang, S. Rambhatla, and Y. Liu, ‘‘Interpretable
10.1109/ICICS49469.2020.239493. and trustworthy deepfake detection via dynamic prototypes,’’ 2020,
[55] X. Dong, J. Bao, D. Chen, W. Zhang, N. Yu, D. Chen, F. Wen, and arXiv:2006.15473.
B. Guo, ‘‘Identity-driven deepfake detection,’’ 2020, arXiv:2012.03930. [77] M. Du, S. Pentyala, Y. Li, and X. Hu, ‘‘Towards generalizable deep-
[56] T. Zhao, X. Xu, M. Xu, H. Ding, Y. Xiong, and W. Xia, ‘‘Learning self- fake detection with locality-aware autoencoder,’’ in Proc. 29th ACM
consistency for deepfake detection,’’ 2020, arXiv:2012.09311. Int. Conf. Inf. Knowl. Manage., New York, NY, USA, Oct. 2020, doi:
[57] L. Bondi, E. Daniele Cannas, P. Bestagini, and S. Tubaro, ‘‘Training 10.1145/3340531.3411892.
strategies and data augmentations in CNN-based deepfake video detec- [78] T. Fernando, C. Fookes, S. Denman, and S. Sridharan, ‘‘Exploiting human
tion,’’ 2020, arXiv:2011.07792. social cognition for the detection of fake and fraudulent faces via memory
[58] Z. Hongmeng, Z. Zhiqiang, S. Lei, M. Xiuqing, and W. Yuehan, networks,’’ 2019, arXiv:1911.07844.
‘‘A detection method for deepfake hard compressed videos based on [79] L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo, ‘‘Face
super-resolution reconstruction using CNN,’’ in Proc. 4th High Perform. X-ray for more general face forgery detection,’’ 2019, arXiv:1912.13458.
Comput. Cluster Technol. Conf. 3rd Int. Conf. Big Data Artif. Intell., [80] Z. Chen and H. Yang, ‘‘Attentive semantic exploring for manipulated face
New York, NY, USA, Jul. 2020, pp. 98–103, doi: 10.1145/3409501. detection,’’ 2020, arXiv:2005.02958.
3409542. [81] T. D. Nhu, I. S. Na, H. J. Yang, G. S. Lee, and S. H. Kim, ‘‘Forensics face
[59] J. Han and T. Gevers, ‘‘MMD based discriminative learning for face detection from GANs using convolutional neural network,’’ in Proc. Int.
forgery detection,’’ in Proc. Asian Conf. Comput. Vis. (ACCV), 2020, Symp. Inf. Technol. Converg. (ISITC), 2018, pp. 1–5.
pp. 121–136. [82] X. Ding, Z. Raziei, E. C. Larson, E. V. Olinick, P. Krueger, and
[60] L. Verdoliva, ‘‘Media forensics and DeepFakes: An overview,’’ 2020, M. Hahsler, ‘‘Swapped face detection using deep learning and subjec-
arXiv:2001.06564. tive assessment,’’ EURASIP J. Inf. Secur., vol. 2020, no. 1, pp. 1–12,
Dec. 2020, doi: 10.1186/s13635-020-00109-8.
[61] H. Dang, F. Liu, J. Stehouwer, X. Liu, and A. K. Jain, ‘‘On the detection
[83] Z. Guo, G. Yang, J. Chen, and X. Sun, ‘‘Fake face detection via adaptive
of digital face manipulation,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
manipulation traces extraction network,’’ 2020, arXiv:2005.04945.
Pattern Recognit. (CVPR), Seattle, WA, USA, Jun. 2020, pp. 5780–5789,
doi: 10.1109/CVPR42600.2020.00582. [84] D. Mas Montserrat, H. Hao, S. K. Yarlagadda, S. Baireddy, R. Shao,
J. Horváth, E. Bartusiak, J. Yang, D. Güera, F. Zhu, and E. J. Delp, ‘‘Deep-
[62] H. H. Nguyen, J. Yamagishi, and I. Echizen, ‘‘Capsule-forensics: Using
fakes detection with automatic face weighting,’’ 2020, arXiv:2004.12027.
capsule networks to detect forged images and videos,’’ in Proc. IEEE
[85] L. M. Dang, S. I. Hassan, S. Im, and H. Moon, ‘‘Face image manipulation
Int. Conf. Acoust., Speech Signal Process. (ICASSP), Brighton, U.K.,
detection based on a convolutional neural network,’’ Expert Syst. Appl.,
May 2019, pp. 2307–2311, doi: 10.1109/ICASSP.2019.8682602.
vol. 129, pp. 156–168, Sep. 2019.
[63] H. H. Nguyen, J. Yamagishi, and I. Echizen, ‘‘Use of a capsule network
[86] Z. Liu, X. Qi, J. Jia, and P. H. S. Torr, ‘‘Real or fake: An empirical study
to detect fake images and videos,’’ 2019, arXiv:1910.12467.
and improved model for fake face detection,’’ in Proc. 8th Int. Conf.
[64] N. Bonettini, E. Daniele Cannas, S. Mandelli, L. Bondi, P. Bestagini, Learn. Represent. (ICLR), Apr. 2020, pp. 1–12.
and S. Tubaro, ‘‘Video face manipulation detection through ensemble of
[87] R. Durall, M. Keuper, and J. Keuper, ‘‘Watch your up-convolution:
CNNs,’’ 2020, arXiv:2004.07676.
CNN based generative deep neural networks are failing to reproduce
[65] M. S. Rana and A. H. Sung, ‘‘DeepfakeStack: A deep ensemble-based spectral distributions,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
learning technique for deepfake detection,’’ in Proc. 7th IEEE Int. Conf. Recognit. (CVPR), Seattle, WA, USA, Jun. 2020, pp. 7887–7896, doi:
Cyber Secur. Cloud Comput. (CSCloud)/6th IEEE Int. Conf. Edge Com- 10.1109/CVPR42600.2020.00791.
put. Scalable Cloud (EdgeCom), New York, NY, USA, Aug. 2020, [88] M. A. S. Habeeba, A. Lijiya, and A. M. Chacko, ‘‘Detection of deepfakes
pp. 70–75, doi: 10.1109/CSCloud-EdgeCom49738.2020.00021. using visual artifacts and neural network classifier,’’ in Innovations in
[66] D. Guera and E. J. Delp, ‘‘Deepfake video detection using recurrent Electrical and Electronic Engineering (Lecture Notes in Electrical Engi-
neural networks,’’ in Proc. 15th IEEE Int. Conf. Adv. Video Signal Based neering), vol. 661, M. Favorskaya, S. Mekhilef, R. Pandey, and N. Singh,
Surveill. (AVSS), Nov. 2018, pp. 1–6. Eds. Singapore: Springer, 2020, pp. 411–422, doi: 10.1007/978-981-15-
[67] S. Sohrawardi, A. Chintha, B. Thai, S. Seng, A. Hickerson, R. Ptucha, 4692-1_31.
and M. Wright, ‘‘Poster: Towards robust open-world detection of deep- [89] C.-C. Hsu, Y.-X. Zhuang, and C.-Y. Lee, ‘‘Deep fake image detection
fakes,’’ in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., 2019, based on pairwise learning,’’ Appl. Sci., vol. 10, no. 1, p. 370, Jan. 2020,
pp. 2613–2615. doi: 10.3390/app10010370.
[68] E. Sabir, J. Cheng, A. Jaiswal, W. Abd-Almageed, I. Masi, and [90] S. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, ‘‘CNN-
P. Natarajan, ‘‘Recurrent convolutional strategies for face manip- generated images are surprisingly easy to spot. . . for now,’’ in Proc.
ulation detection in videos,’’ in Proc. CVPR Workshops, 2019, IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Seattle, WA,
pp. 80–87. USA, Jun. 2020, pp. 8692–8701, doi: 10.1109/CVPR42600.2020.00872.
[69] S. Tariq, S. Lee, and S. S. Woo, ‘‘A convolutional LSTM based residual [91] P. Korshunov and S. Marcel, ‘‘DeepFakes: A new threat to face recogni-
network for deepfake video detection,’’ 2020, arXiv:2009.07480. tion? Assessment and detection,’’ 2018, arXiv:1812.08685.
[70] I. Masi, A. Killekar, R. M. Mascarenhas, S. P. Gurudatt, and [92] A. Gandhi and S. Jain, ‘‘Adversarial perturbations fool deepfake detec-
W. Abd-Almageed, ‘‘Two-branch recurrent network for isolating deep- tors,’’ in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2020, pp. 1–8.
fakes in videos,’’ in Proc. 16th Eur. Conf. Comput. Vis., Aug. 2020, [93] K. Zhu, B. Wu, and B. Wang, ‘‘Deepfake detection with clustering-
pp. 667–684. based embedding regularization,’’ in Proc. IEEE 5th Int. Conf. Data
[71] A. Chintha, B. Thai, S. J. Sohrawardi, K. Bhatt, A. Hickerson, Sci. Cyberspace (DSC), Hong Kong, Jul. 2020, pp. 257–264, doi:
M. Wright, and R. Ptucha, ‘‘Recurrent convolutional structures for audio 10.1109/DSC50466.2020.00046.
spoof and video deepfake detection,’’ IEEE J. Sel. Topics Signal Process., [94] P. Charitidis, G. Kordopatis-Zilos, S. Papadopoulos, and I. Kompatsiaris,
vol. 14, no. 5, pp. 1024–1037, Aug. 2020, doi: 10.1109/JSTSP.2020. ‘‘Investigating the impact of pre-processing and prediction aggregation
2999185. on the DeepFake detection task,’’ 2020, arXiv:2006.07084.
[72] I. Amerini, L. Galteri, R. Caldelli, and A. Del Bimbo, ‘‘Deepfake video [95] P. Charitidis, G. Kordopatis-Zilos, S. Papadopoulos, and I. Kompatsiaris,
detection through optical flow based CNN,’’ in Proc. IEEE/CVF Int. Conf. ‘‘Investigating the impact of pre-processing and prediction aggregation
Comput. Vis. Workshop (ICCVW), Oct. 2019, pp. 1205–1207. on the deepfake detection task,’’ 2020, arXiv:2006.07084.
[96] X. Li, K. Yu, S. Ji, Y. Wang, C. Wu, and H. Xue, ‘‘Fighting against [119] R. Tolosana, S. Romero-Tapiador, J. Fierrez, and R. Vera-Rodriguez,
deepfake: Patch&pair convolutional neural networks (PPCNN),’’ in Proc. ‘‘DeepFakes evolution: Analysis of facial regions and fake detection
Companion Web Conf., New York, NY, USA, 2020, pp. 88–89, doi: performance,’’ 2020, arXiv:2004.07532.
10.1145/3366424.3382711. [120] S. Lyu, ‘‘Deepfake detection: Current challenges and next steps,’’ in Proc.
[97] C. X. T. Du, L. H. Duong, H. T. Trung, P. M. Tam, N. Q. V. Hung, IEEE Int. Conf. Multimedia Expo Workshops (ICMEW), London, U.K.,
and J. Jo, ‘‘Efficient-frequency: A hybrid visual forensic framework for Jul. 2020, pp. 1–6, doi: 10.1109/ICMEW46912.2020.9105991.
facial forgery detection,’’ in Proc. IEEE Symp. Ser. Comput. Intell. (SSCI), [121] Y. Mirsky and W. Lee, ‘‘The creation and detection of deepfakes: A
Canberra, ACT, Australia, Dec. 2020, pp. 707–712. survey,’’ 2020, arXiv:2004.11138.
[98] D. Cozzolino, A. Rössler, J. Thies, M. Nießner, and [122] L. Guarnera, O. Giudice, C. Nastasi, and S. Battiato, ‘‘Preliminary foren-
L. Verdoliva, ‘‘ID-reveal: Identity-aware DeepFake video detection,’’ sics analysis of deepfake images,’’ 2020, arXiv:2004.12626.
2020, arXiv:2012.02512. [123] A. O. J. Kwok and S. G. M. Koh, ‘‘Deepfake: A social construction
[99] W. Zhang, C. Zhao, and Y. Li, ‘‘A novel counterfeit feature extrac- of technology perspective,’’ Current Issues Tourism, vol. 24, no. 13,
tion technique for exposing face-swap images based on deep learning pp. 1798–1802, 2020, doi: 10.1080/13683500.2020.1738357.
and error level analysis,’’ Entropy, vol. 22, no. 2, p. 249, 2020, doi: [124] J. Kietzmann, L. W. Lee, I. P. McCarthy, and T. C. Kietzmann, ‘‘Deep-
10.3390/e22020249. fakes: Trick or treat?’’ Bus. Horizons, vol. 63, no. 2, pp. 135–146,
[100] T. Mittal, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha, Mar. 2020, doi: 10.1016/j.bushor.2019.11.006.
‘‘Emotions don’t lie: An audio-visual deepfake detection method using [125] J. Frank, T. Eisenhofer, L. Schonherr, A. Fischer, D. Kolossa, and
affective cues,’’ in Proc. 28th ACM Int. Conf. Multimedia, Seattle, WA, T. Holz, ‘‘Leveraging frequency analysis for deep fake image recog-
USA, Oct. 2020, doi: 10.1145/3394171.3413570. nition,’’ in Proc. 37th Int. Conf. Mach. Learn. (ICML), Jul. 2020,
[101] Y. Nirkin, L. Wolf, Y. Keller, and T. Hassner, ‘‘DeepFake detec- pp. 3247–3258.
tion based on discrepancies between faces and their context,’’ 2020, [126] M.-H. Maras and A. Alexandrou, ‘‘Determining authenticity of video
arXiv:2008.12262. evidence in the age of artificial intelligence and in the wake of deepfake
[102] C. M. Yu, C. T. Chang, and Y. W. Ti, ‘‘Detecting deepfake-forged contents videos,’’ Int. J. Evidence Proof, vol. 23, no. 3, pp. 255–262, Jul. 2019,
with separable convolutional neural network and image segmentation,’’ doi: 10.1177/1365712718807226.
2019, arXiv:1912.12184. [127] M. Westerlund, ‘‘The emergence of deepfake technology: A review,’’
[103] D. Feng, X. Lu, and X. Lin, ‘‘Deep detection for face manipulation,’’ Technol. Innov. Manage. Rev., vol. 9, no. 11, pp. 40–53, 2019, doi:
2020, arXiv:2009.05934. 10.22215/timreview/1282.
[104] L. Chai, D. Bau, S. Lim, and P. Isola, ‘‘What makes fake images [128] C. Öhman, ‘‘Introducing the pervert’s dilemma: A contribution to the
detectable? Understanding properties that generalize,’’ in Proc. 16th Eur. critique of deepfake,’’ Ethics and Inf. Technol., vol. 22,pp. 133–140,
Conf. Comput. Vis., Aug. 2020, pp. 103–120. Nov. 2020, doi: 10.1007/s10676-019-09522-1.
[105] X. Chang, J. Wu, T. Yang, and G. Feng, ‘‘DeepFake face image detection [129] R. Thakur and R. Rohilla, ‘‘Recent advances in digital image manipula-
based on improved VGG convolutional neural network,’’ in Proc. 39th tion detection techniques: A brief review,’’ Forensic Sci. Int., vol. 312,
Chin. Control Conf. (CCC), Shenyang, China, Jul. 2020, pp. 7252–7256, Jul. 2020, Art. no. 110311, doi: 10.1016/j.forsciint.2020.110311.
[130] N. Carlini and H. Farid, ‘‘Evading deepfake-image detectors with white-
doi: 10.23919/CCC50068.2020.9189596.
and black-box attacks,’’ 2020, arXiv:2004.00622.
[106] U. Aybars Ciftci, I. Demir, and L. Yin, ‘‘How do the hearts of deep fakes
[131] A. Pishori, B. Rollins, N. van Houten, N. Chatwani, and O. Uraimov,
beat? Deep fake source detection via interpreting residuals with biological
‘‘Detecting deepfake videos: An analysis of three techniques,’’ 2020,
signals,’’ 2020, arXiv:2008.11363.
arXiv:2007.08517.
[107] H. M. Nguyen and R. Derakhshani, ‘‘Eyebrow recognition for identifying
[132] O. de Lima, S. Franklin, S. Basu, B. Karwoski, and A. George, ‘‘Deep-
deepfake videos,’’ in Proc. Int. Conf. Biometrics Special Interest Group
fake detection using spatiotemporal convolutional networks,’’ 2020,
(BIOSIG), Darmstadt, Germany, Sep. 2020, pp. 1–5.
arXiv:2006.14749.
[108] M. Koopman, A. M. Rodriguez, and Z. Geradts, ‘‘Detection of deepfake [133] S. Hussain, P. Neekhara, M. Jere, F. Koushanfar, and J. McAuley, ‘‘Adver-
video manipulation,’’ in Proc. 20th Irish Mach. Vis. Image Process. Conf. sarial deepfakes: Evaluating vulnerability of deepfake detectors to adver-
(IMVIP), London, U.K., 2018, pp. 1–4. sarial examples,’’ 2020, arXiv:2002.12749.
[109] B. L. Welch, ‘‘The generalization of students’ problem when sev- [134] P. Korshunov and S. Marcel, ‘‘Deepfake detection: Humans vs.
eral different population variances are involved,’’ Biometrika, vol. 34, machines,’’ 2020, arXiv:2009.03155.
nos. 1–2, pp. 28–35, 1947. [135] H. U. U. Chi Maduakor and R. E. Alo Williams, ‘‘Integrating deepfake
[110] L. Guarnera, O. Giudice, and S. Battiato, ‘‘DeepFake detection by ana- detection into cybersecurity curriculum,’’ in Proc. Future Technol. Conf.
lyzing convolutional traces,’’ in Proc. IEEE/CVF Conf. Comput. Vis. (FTC) (Advances in Intelligent Systems and Computing), vol. 1288,
Pattern Recognit. Workshops (CVPRW), Seattle, WA, USA, Jun. 2020, K. Arai, S. Kapoor, and R. Bhatia, Eds. Cham, Switzerland: Springer,
pp. 2841–2850, doi: 10.1109/CVPRW50498.2020.00341. 2020, pp. 588–598, doi: 10.1007/978-3-030-63128-4_45.
[111] S. Agarwal and L. R. Varshney, ‘‘Limits of deepfake detection: A robust [136] Contributing Data to Deepfake Detection Research. Accessed:
estimation viewpoint,’’ in Proc. 36th Int. Conf. Mach. Learn. (ICML), Jan. 4, 2021. [Online]. Available: https://ai.googleblog.com/2019/09/
Long Beach, CA, USA, 2019. contributing-data-to-deepfake-detection.html
[112] U. M. Maurer, ‘‘Authentication theory and hypothesis testing,’’ IEEE [137] Z. Liu, P. Luo, X. Wang, and X. Tang, ‘‘Deep learning face attributes
Trans. Inf. Theory, vol. 46, no. 4, pp. 1350–1356, Jul. 2000, doi: in the wild,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2015,
10.1109/18.850674. pp. 3730–3738, doi: 10.1109/ICCV.2015.425.
[113] H. Hasan and K. Salah, ‘‘Combating deepfake videos using blockchain [138] Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, ‘‘Celeb-DF: A large-scale
and smart contracts,’’ IEEE Access, vol. 7, pp. 41596–41606, 2019, doi: challenging dataset for deepfake forensics,’’ 2019, arXiv:1909.12962.
10.1109/ACCESS.2019.2905689. [139] B. Dolhansky, R. Howes, B. Pflaum, N. Baram, and C. Canton Ferrer,
[114] IPFS Powers the Distributed Web. Accessed: Jun. 5, 2020. [Online]. ‘‘The deepfake detection challenge (DFDC) preview dataset,’’ 2019,
Available: https://ipfs.io/ arXiv:1910.08854.
[115] C. C. Ki Chan, V. Kumar, S. Delaney, and M. Gochoo, ‘‘Combating [140] L. Jiang, R. Li, W. Wu, C. Qian, and C. Change Loy, ‘‘DeeperForensics-
deepfakes: Multi-LSTM and blockchain as proof of authenticity for 1.0: A large-scale dataset for real-world face forgery detection,’’ 2020,
digital media,’’ in Proc. IEEE/ITU Int. Conf. Artif. Intell. Good (AI4G), arXiv:2001.03024.
Sep. 2020, pp. 55–62. [141] B. Zi, M. Chang, J. Chen, X. Ma, and Y.-G. Jiang, ‘‘WildDeepfake:
[116] J. Li, T. Shen, W. Zhang, H. Ren, D. Zeng, and T. Mei, ‘‘Zooming into A challenging real-world dataset for deepfake detection,’’ in Proc.
face forensics: A pixel-level analysis,’’ 2019, arXiv:1912.05790. 28th ACM Int. Conf. Multimedia, New York, NY, USA, Oct. 2020,
[117] T. Thi Nguyen, Q. Viet Hung Nguyen, D. Tien Nguyen, D. Thanh pp. 2382–2390, doi: 10.1145/3394171.3413769.
Nguyen, T. Huynh-The, S. Nahavandi, T. Tam Nguyen, Q.-V. Pham, and [142] A. Khodabakhsh, R. Ramachandra, K. Raja, P. Wasnik, and C. Busch,
C. M. Nguyen, ‘‘Deep learning for deepfakes creation and detection: A ‘‘Fake face detection methods: Can they be generalized?’’ in Proc. Int.
survey,’’ 2019, arXiv:1909.11573. Conf. Biometrics Special Interest Group (BIOSIG), Darmstadt, Germany,
[118] R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales, and Sep. 2018, pp. 1–6, doi: 10.23919/BIOSIG.2018.8553251.
J. Ortega-Garcia, ‘‘Deepfakes and beyond: A survey of face manipulation [143] P. Gupta, K. Chugh, A. Dhall, and R. Subramanian, ‘‘The eyes know it:
and fake detection,’’ Inf. Fusion, vol. 64, pp. 131–148, Dec. 2020, doi: FakeET—An eye-tracking database to understand deepfake perception,’’
10.1016/j.inffus.2020.06.014. 2020, arXiv:2006.06961.
[144] L. Li, J. Bao, H. Yang, D. Chen, and F. Wen, ‘‘Advancing high fidelity BEDDHU MURALI received the Ph.D. degree
identity swapping for forgery detection,’’ in Proc. IEEE/CVF Conf. Com- in aerospace engineering from Mississippi State
put. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 5074–5083. University, in 1992. He is currently an Asso-
[145] Y. Pan, X. Ge, C. Fang, and Y. Fan, ‘‘A systematic literature review of ciate Professor of computing sciences and com-
Android malware detection using static analysis,’’ IEEE Access, vol. 8, puter engineering (CSCE) at The University of
pp. 116363–116379, 2020, doi: 10.1109/ACCESS.2020.3002842. Southern Mississippi, USA. His research interests
[146] L. Li, T. F. Bissyandé, M. Papadakis, S. Rasthofer, A. Bartel, D. Octeau,
include scientific computational algorithms, high-
J. Klein, and L. Traon, ‘‘Static analysis of Android apps: A systematic
literature review,’’ Inf. Softw. Technol., vol. 88, pp. 67–95, Aug. 2017, doi: performance computing, image and video process-
10.1016/j.infsof.2017.04.001. ing, robotics, and machine learning.
[147] T. Baar, W. van Houten, and Z. Geradts, ‘‘Camera identification by
grouping images from database, based on shared noise patterns,’’ 2012,
arXiv:1207.2641.