A Survey On Adversarial Examples in Deep Learning
A Survey On Adversarial Examples in Deep Learning
A Survey On Adversarial Examples in Deep Learning
DOI:10.32604/jbd.2020.012294
Article
Abstract: Adversarial examples are hot topics in the field of security in deep
learning. The feature, generation methods, attack and defense methods of the
adversarial examples are focuses of the current research on adversarial examples.
This article explains the key technologies and theories of adversarial examples
from the concept of adversarial examples, the occurrences of the adversarial
examples, the attacking methods of adversarial examples. This article lists the
possible reasons for the adversarial examples. This article also analyzes several
typical generation methods of adversarial examples in detail: Limited-memory
BFGS (L-BFGS), Fast Gradient Sign Method (FGSM), Basic Iterative Method
(BIM), Iterative Least-likely Class Method (LLC), etc. Furthermore, in the
perspective of the attack methods and reasons of the adversarial examples, the
main defense techniques for the adversarial examples are listed: preprocessing,
regularization and adversarial training method, distillation method, etc., which
application scenarios and deficiencies of different defense measures are pointed
out. This article further discusses the application of adversarial examples which
currently is mainly used in adversarial evaluation and adversarial training.
Finally, the overall research direction of the adversarial examples is prospected
to completely solve the adversarial attack problem. There are still a lot of
practical and theoretical problems that need to be solved. Finding out the
characteristics of the adversarial examples, giving a mathematical description of
its practical application prospects, exploring the universal method of adversarial
example generation and the generation mechanism of the adversarial examples
are the main research directions of the adversarial examples in the future.
1 Introduction
In 2012, in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), deep learning began to
emerge [1]. In recent years, deep learning has developed rapidly, and its application scope has been further
expanded [1–2], the network structure is more complicated [2–3]. The training method has been improved,
and the application of some important techniques has further improved the classification performance and
reduced the training time [2–5]. For example, in the field of image recognition, experimental results on
some standard test sets indicate that the recognition capabilities of deep learning models can already reach
the level of human intelligence. However, while deep learning brings great convenience to people, it also
has some security problems. Hidden security issues have gradually attracted the attention of security experts.
Therefore, many scholars have begun to pay attention to the anti-interference ability of deep learning
models which the research on deep learning adversarial example problems.
Early in the security field that applied deep learning algorithms such as spam detection systems and
intrusion detection systems, the problem of evading detection based on the characteristics of the system
This work is licensed under a Creative Commons Attribution 4.0 International License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original
work is properly cited.
72 JBD, 2020, vol.2, no.2
model was discovered, which brought great challenges to the security detection in deep learning. Up to
today, more and more problems that threaten the security of deep learning have been discovered. There
are illegal authentication hazards that mimic the identity of victims against the defects of the Face
Recognition System (FRS), there are also privacy theft hazards involving medical data and people’s
picture data, and malicious control hazards against autonomous vehicles and voice control systems [6].
Therefore, the problem of adversarial examples in deep learning is more and more worthy of
attention. The causes and methods of adversarial examples are the key issues in the study of adversarial
examples. Using adversarial examples for adversarial training and improving the robustness of the system
and the security of deep learning are imminent. Although it has been speculated that the reason for the
adversarial examples is the highly non-linear features of the deep neural network and the insufficient
average model and insufficient regularization in supervised learning, Goodfellow pointed out that the
linear rather than nonlinear high-dimensional space is the real reason for the adversarial example [7]. At
present, the main generation methods of adversarial examples are: L-BGFS, FGSM, BIM, LLC and other
methods. Finding the attack features and attack methods of the adversarial examples is the core of our
problem solving. From the application scenarios, the attack methods are mainly divided into two types,
one is the black box attack, and the other is the white box attack [2]. The ability of adversarial examples
to have black box attacks is due to the transferability of adversarial examples [5]. Goodfellow proposed
that the generalization ability of adversarial examples in different models is caused by the high degree of
consistency between anti-interference and model weights. Therefore, when training the same task, the
adversarial examples can learn similar functions on different models [8]. Exploring different defense
algorithms for adversarial example attacks is the main goal of studying adversarial examples. Currently,
the defense techniques for adversarial examples can be divided into four categories: regularization
method, adversarial training method, distillation method, and rejection option method. Although these
methods can resist adversarial examples attacks to a certain extent, these methods cannot be applied to all
models, so researching more powerful algorithms to defend adversarial example attacks is the main
research direction in the future.
Although the probability of adversarial interference is much smaller than that of noise interference,
the probability of being misclassified by the classifier is much higher than that of noise interference. In
addition, models with different structures trained on different subsets of the training set will all
misclassify the same adversarial example. Adversarial example has become a blind spot for the training
algorithm. Biggio elaborated the related concepts of the adversarial example [9], and proposed an
adversary model, an adversary’s goal, adversary’s knowledge and adversary's capability.
In order to stimulate the optimal attack strategy of adversarial example, we need to know the
knowledge required for the attack and the ability of the opponent to manipulate the data, and finally use
the adversarial model to modify the potential data distribution and use the anti-attack ability to generate
the optimal attack strategy. We use the general model of the adversary to clarify some concepts of
adversarial examples.
adversarial examples is extremely low, so it is difficult to observe the adversarial examples in the test set
[5]. Therefore, it is necessary to further explore how to improve the probability that the neural network
correctly classifies the adversarial examples.
3 Generation Methods
Adversarial examples are the key to evaluate and improve the robustness of machine learning.
Therefore, studying how to generate adversarial examples is a necessary step for studying adversarial
examples. There are many ways to generate adversarial samples. At present, the main generation methods
include L-BFGS, FGSM, BIM, etc.
local Spatial.
Table 1: Typical adversarial example construction methods
Derived Features Attack Target Iteration Attack Mode Scope of Application
L-BFGS Optimized search Targeted Multiple White box Special
Deep Fool Optimized search Non-targeted Multiple White box Special
UAP Optimized search Non-targeted Multiple White box Universal
FGSM Feature construction Non-targeted Single White box Special
BIM Feature construction Non-targeted Multiple White box Special
LLC Feature construction Non-targeted Multiple White box Special
JSMA Feature construction Targeted Multiple White box Special
PBA Feature construction Targeted&Non-target Multiple Black box Special
ATN generative model Targeted&Non-target Multiple Black&White box Special
X 0adv = X
adv (3)
X N +1 = Clip X ,ε { X N +1 + αsign(∇ X J ( X N , ytrue ))}
adv adv
In the above formula, X is a 3-D input image. Assuming that each pixel intensity value is an integer
value between [0,255]. ytrue is the real class label of image X , and J ( X , y ) is the cross-entropy cost
function of the neural network on the given image X and label y . α is the step size which usually
equals one, and the number of iterations is selected as min(ε + 4,1.25ε ) . For the output layer of the
neural network, the cross-entropy cost function applied to integer class labels is equivalent to solving the
negative logarithmic conditional probability on the real class labels of a given image:
J ( X , y ) = − log p ( y | X ) ; Clip X ,ε { X '} is the cropping function of the X ' pixels of the image which
ensures that the cropped image is still in the neighborhood of the original image.
However, the above method only attempts to increase the loss value of the correct classification. It
does not clearly indicate which wrong class label should be selected by the model. These methods are
only suitable for data sets with few types and different types from each other (such as MNIST and
CIFAR-10).
F = {x : w T ⋅ x + b = 0} .
saliency list is calculated by using the gradient output of the network layer. Once the saliency list is
calculated, the algorithm will select the most effective pixel to deceive the network. The JSMA method
modifies the original input. At the same time, the calculation process is relatively simple because the
JSMA method uses forward propagation to calculate the salient points.
4 Defense Methods
Traditional techniques that make machine learning models more robust, such as weight decay,
usually cannot effectively defend adversarial examples. Machine learning models are often interfered by
adversarial examples, which are maliciously interfered with input in order to mislead the model during
testing. Adversarial examples are a security threat to the actual deployment of machine learning systems.
It is worth noting that these inputs are transformed between models, thereby implementing black box
attacks on the deployed models. The defense techniques against adversarial samples mainly include:
Preprocessing, regularization and adversarial training, distillation method, rejection option, etc., as shown
in Fig. 3.
4.1 Preprocessing
Natural images have some special properties, such as high correlation between adjacent pixels and
low energy in the high-frequency domain. Assuming that this antagonistic change does not exist in the
same space as the natural image, some work considers using filtering. The filter filters the image to
remove the interference of adversarial examples. Although preprocessing the input makes the attack more
challenging, it does not eliminate the possibility of being attacked. In addition, these filters usually reduce
the classification accuracy of the data that is not subject to interference.
(1) How to define the transferable of adversarial examples, how to measure the degree of
transferable, and how to determine the upper and lower bounds of the transfer, so as to use the transfer to
effectively generate adversarial examples, effectively detect the adversarial examples, and defend against
the attacks of the adversarial examples.
(2) The reason and principle of the generation of adversarial examples on general data or specific data
make the classifier unable to correctly identify the adversarial examples. Establish a complete mathematical
and unified theory of adversarial example generation, and then realize a high probability guaranteed deep
neural network implementation theory and form of defense adversarial example attacks, laying a theoretical
foundation for the actual deployment of deep neural network machine learning algorithms.
(3) Construct a universal benchmark software platform for generating adversarial examples, so that
the current research on adversarial examples can evaluate the experimental results on a unified standard
data set [40].
7 Conclusion
The problem of adversarial examples is getting more and more attention. Discussing the reasons for
the occurrence of adversarial examples and how to generate them are the key issues in the study of
adversarial examples. This article first summarizes the causes of the adversarial examples and the latest
research progress and points out that the current conjecture is not convincing. Further researches on the
reasons for adversarial example deserve us to find out. Secondly, the main generation methods of the
adversarial examples are the F-BFGS method, the FGSM, the base iterative method and iterative least-
likely class method. The review pointed out their advantages and disadvantages and applicable scenarios.
The purpose of studying the reasons for the occurrence of adversarial examples and the generation
method is to protect the machine learning system from the attacks of adversarial example. At the end of
this article, the main popular defense technologies that are currently based are preprocessing method,
regularization method, adversarial training method, distillation method, reject option method, and other
methods reviewed. This paper points out the application scenarios and deficiencies in different defense
measures, explaining that none of the above defense measures can completely avoid adversarial example
attacks. In summary, to further study the characteristics of the adversarial example, give a mathematical
description of practical application prospect, explore universal adversarial example generation method,
completely solve the adversarial attack problem, there are still a large number of theoretical and practical
problems to be solved.
Funding Statement: This work is supported by the NSFC [Grant Nos. 61772281, 61703212]; the
Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD) and Jiangsu
Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET).
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the
present study.
References
[1] S. loffe, C. Szegedy, “Batch normalization, accelerating deep network training by reducing internal covariate
shift,” in Proc. ICML, Lille, France, pp. 448–456, 2015.
[2] V. Mnih, K. Kavukcuoglu and D. Silver, “Human-level control through deep reinforcement learning,” Journal
of Nature, vol. 518, no. 7540, pp. 529–533, 2015.
[3] N. Srivastava, G. Hinton and A. Krizhevsky, “Dropout: A simple way to prevent neural networks from
overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
[4] K. He, X. Zhang and S. Ren and J. Sun, “Deep residual learning for image recognition,” in Proc. CVPR, Las
Vegas, USA, pp. 770–778, 2016.
[5] C. Szegedy, W. Zaremba and 1. Sutskever, “Intriguing properties of neural networks,” in Proc. ICLR, pp.
1312–1320, 2014.
JBD, 2020, vol.2, no.2 83
[6] P. Li, W. T. Zhao and Q. Liu and C. J. Wu, “Review of machine learning security and its defense technology,”
Computer Science and Exploration, vol. 12, no. 2, pp. 171–184, 2018.
[7] J. Goodfellow, J. Shlens and C. Szegedy, “Explaining and Harnessing Adversarial Examples,” in Proc. ICML,
Beijing, China, pp.278–293, 2014.
[8] A. Kurakin, 1. Goodfellow and S. Bengio, “Adversarial examples in the physical world,” in Proc. ICLR,
Toulon, France, pp. 1726–1738, 2016.
[9] B. Biggio, I. Corona and D. Maiorca, “Evasion attacks against machine learning at test time,” in Proc. MLKD
Databases, Springer Berlin Heidelberg, pp. 387–402, 2017.
[10] P. McDaniel, N. Papernot and Z. B. Celik, “Machine Learning in Adversarial Settings,” IEEE Security &
Privacy, vol. 14, no. 3, pp. 68–72, 2016.
[11] S. M. Moosavi-dezfooli, A. Fawzi and P. Frossard, “Deepfool: A simple and accurate method to fool deep
neural networks” in Proc. CVPR, pp. 282–297, 2016.
[12] S. M. Moosavi-dezfooli, A. Fawzi and O. Fawzi. “Universal adversarial perturbations,” in Proc. CVPR,
pp. 1765–1773, 2017
[13] N. Papernot, P. Mcdaniel and I. J. Goodfellow, “Practical black-box attacks against machine learning,” in
Proc. of the IEEE European Sym. on Security and Privacy, Saarbricken, Germany, pp. 506–519, 2016.
[14] N. Papernot, P. Mcdaniel and I. J. Goodfellow, “Practical black-box attacks against deep learning systems
using adversarial examples,” Cryptography and Security, 2016.
[15] Y. Senzaki, S. Ohata and K. Matsuura, “Simple black-box adversarial examples generation with very few queries,”
The Institute of Electronics, Information and Communication Engineers, vol. E103–D, no. 2, 2020.
[16] A. Ilyas, L. Engstrom, A. Athalye and J. Lin, “Black-box adversarial attacks with limited queries and
information,” in Proc. ICML, Stockholm, Sweden, pp. 237–245, 2018.
[17] S. Baluja and I. Fischer, “Adversarial transformation networks: Learning to generate adversarial examples,” in
Proc. CVPR, HI, USA, pp. 2300–2309, 2017.
[18] Y. Liu, X. Chen and C. Liu, “Delving into transferable adversarial examples and black-box attacks,” in Proc.
ICLR, pp. 1–14, 2017.
[19] N. Papernot, P. Mcdaniel and S. Jha, “The limitations of deep learning in adversarial settings,” in Proc. of the
IEEE European Sym. on Security and Privacy, Saarbrucken, Germany, pp. 372–387, 2016.
[20] D. F. Smith, A. Wiliem and B. C. Lovell, “Face recognition on consumer devices: Reflections on replay
attacks,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 4, pp. 736–745, 2015.
[21] B. Zhang, B. Tondi and M. Barni, “Adversarial examples for replay attacks against CNN-based face
recognition with anti-spoofing capability,” Computer Vision & Image Understanding, vol. 197, no. 2, pp. 33–
44, 2020.
[22] A. Kurakin, I. J. Goodfellow and S. Bengio, “Adversarial examples in the physical world,” in Proc. ICLR, pp.
1607–1617, 2016.
[23] Xiao, B. Li and J. Y. Zhu, “Generating adversarial examples with adversarial networks” in Proc. IJCAL,
Macao, China, pp. 3805–3911, 2019.
[24] N. Carlini, P. Mishra and T. Vaidya, “Hidden voice commands,” in Proc. USENIX, Austin, USA, pp, 513–
530, 2016.
[25] Xie, J. Wang and Z. Zhang, “Adversarial examples for semantic segmentation and object detection,” in Proc.
ICCV, Venice, Italy, pp. 1378–1387, 2017.
[26] G. Fumera, F. Roli and G. Giacinto, “Reject option with multiple thresholds,” Pattern Recognition, vol. 33, no.
12, pp. 2099–2101, 2000.
[27] R. Herbei and M. H. Wegkamp, “Classification with reject option.” Canadian Journal of Statistics, vol. 34, no.
4, pp. 709–721, 2010.
[28] P. L. Bartlett and M. H. Wegkamp, “Classification with a reject option using a hinge loss,” Journal of Machine
Learning Research, vol. 9, pp. 1823–1840, 2008.
[29] Cortes, G. Desalvo and M. Mohri, “Learning with rejection,” in Proc. ICML, Bari, ltaly, pp. 67–82, 2016.
84 JBD, 2020, vol.2, no.2
[30] Bromley, J. Denker, “Improving rejection performance on handwritten digits by training with “Rubbish”,”
Journal of Neural Computation, vol. 5, no. 3, pp. 367–370, 1993.
[31] Y. Lecun, L. Bottou and Y. Bengio, “Gradient-based learning applied to document recognition,” in Proc. of the
IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[32] B. Yadav and V. S. Devi, “Novelty detection applied to the classification problem using probabilistic neural
network,” in Proc. CIDM, Orlando, USA, pp. 265–272, 2014.
[33] X. Li and F. Li, “Adversarial examples detection in deep networks with convolutional filter statistics,” in Proc.
ICCV, Venice, Italy, pp. 5775–5783, 2017.
[34] S. Aditya and S. Gandharba, “Reversible image steganography using dual-layer LSB matching,” Sensing and
Imaging, vol. 21, no. 1, 2020.
[35] R. Jia and P. Liang, “Adversarial examples for evaluating reading comprehension systems,” in Proc. EMNLP,
Copenhagen, Denmark, pp. 2021–2031, 2017.
[36] G. Feng, Q. J. Zhao, X. Li, X. H. Kuang, J. W. Zhang et al., “Detecting adversarial examples via prediction
difference for deep neural networks,” Information Science, vol. 1, no. 1, pp. 501, 2019.
[37] S. Park, J. K. Park and S. J. Shin, “Adversarial dropout for supervised and semi-supervised learning,” in Proc.
AAAI, New Orleans, USA, pp. 219–231, 2018.
[38] Y. Yu, W. Y. Qu and N. Li, “Open-category classification by adversarial sample generation,” in Proc. IJCAI,
Melbourne, Australia, pp. 3357–3363, 2017.
[39] Russakovsky, J. Deng and H. Su, “ImageNet large scale visual recognition challenge,” International Journal of
Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
[40] Z. Gong, W. Wang and W. S. Ku, “Adversarial and clean data are not twins,” arXiv:1704.04960, 2017.