Adversarial Attacks and Defenses

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

International 

Journal of Automation and Computing 17(2), April 2020, 151-178
DOI: 10.1007/s11633-019-1211-x
 

Adversarial Attacks and Defenses in Images,


Graphs and Text: A Review
Han Xu          Yao Ma          Hao-Chen Liu          Debayan Deb          
Hui Liu          Ji-Liang Tang          Anil K. Jain
Department of Computer Science and Engineering, Michigan State University, Michigan 48823, USA

 
Abstract:   Deep neural networks (DNN) have achieved unprecedented success in numerous machine learning tasks in various domains.
However, the existence of adversarial examples raises our concerns in adopting deep learning to safety-critical applications. As a result,
we have witnessed increasing interests in studying attack and defense mechanisms for DNN models on different data types, such as im-
ages, graphs and text. Thus, it is necessary to provide a systematic and comprehensive overview of the main threats of attacks and the
success of corresponding countermeasures. In this survey, we review the state of the art algorithms for generating adversarial examples
and the countermeasures against adversarial examples, for three most popular data types, including images, graphs and text.

Keywords:   Adversarial example, model safety, robustness, defenses, deep learning.

 
1 Introduction which can be formally defined as – “Adversarial ex-
amples are inputs to machine learning models that an at-
Deep neural networks (DNN) have become increas- tacker intentionally designed to cause the model to make
ingly popular and successful in many machine learning mistakes”. In the image classification domain, these ad-
tasks. They have been deployed in different recognition versarial examples are intentionally synthesized images
problems in the domains of images, graphs, text and which look almost exactly the same as the original im-
speech, with remarkable success. In the image recogni- ages (see Fig. 1), but can mislead the classifier to provide
tion domain, they are able to recognize objects with near- wrong prediction outputs. For a well-trained DNN image
human level accuracy[1, 2]. They are also used in speech classifier on the MNIST dataset, almost all the digit
recognition[3], natural language processing[4] and for play- samples can be attacked by an imperceptible perturba-
ing games[5]. tion, added on the original image. Meanwhile, in other
Because of these accomplishments, deep learning tech- application domains involving graphs, text or audio, sim-
niques are also applied in safety-critical tasks. For ex- ilar adversarial attacking schemes also exist to confuse
ample, in autonomous vehicles, deep convolutional neur- deep learning models. For example, perturbing only a
al networks (CNNs) are used to recognize road signs[6]. couple of edges can mislead graph neural networks[10], and
The machine learning technique used here is required to inserting typos to a sentence can fool text classification or
be highly accurate, stable and reliable. But, what if the dialogue systems[11]. As a result, the existence of ad-
CNN model fails to recognize the “STOP” sign by the versarial examples in all application fields has cautioned
roadside and the vehicle keeps going? It will be a danger- researchers against directly adopting DNNs in safety-crit-
ous situation. Similarly, in financial fraud detection sys- ical machine learning tasks.
tems, companies frequently use graph convolutional net- To deal with the threat of adversarial examples, stud-
works (GCNs)[7] to decide whether their customers are  

trustworthy or not. If there are fraudsters disguising their


personal identity information to evade the company′s de-
tection, it will cause a huge loss to the company. There- + .0007 × =
fore, the safety issues of deep neural networks have be-
come a major concern.
x+
In recent years, many works[2, 8, 9] have shown that x sgn ( x J (θ, x, y))
Δ
ϵsgn ( x J (θ, x, y))
Δ
DNN models are vulnerable to adversarial examples, “panda” “nematode” “gibbon”
 
Review
57.7% confidence 8.2% confidence 99.3% confidence
Manuscript received October 13, 2019; accepted November 11, 2019
 
 
Recommended by Associate Editor Hong Qiao Fig. 1     By  adding  an  unnoticeable  perturbation,  “ panda”  is
 
© The Auther(s) classified as “gibbon” (Image credit: Goodfellow et al.[9])
 

 
 152 International Journal of Automation and Computing 17(2), April 2020

ies have been published with the aim of finding counter- he know the classifier′s structure, its parameters or the
measures to protect deep neural networks. These ap- training set used for classifier training?
proaches can be roughly categorized to three main types: 3) Victim models (Section 2.1.3)
1) Gradient masking[12, 13]: Since most attacking al- What kind of deep learning models do adversaries usu-
gorithms are based on the gradient information of the ally attack? Why are adversaries interested in attacking
classifiers, masking or obfuscating the gradients will con- these models?
fuse the attack mechanisms. 2) Robust optimization[14, 15]: 4) Security evaluation (Section 2.2)
These studies show how to train a robust classifier that How can we evaluate the safety of a victim model
can correctly classify the adversarial examples. 3) Ad- when faced with adversarial examples? What is the rela-
versary detection[16, 17]: The approaches attempt to check tionship and difference between these security metrics
whether a sample is benign or adversarial before feeding and other model goodness metrics, such as accuracy or
it to the deep learning models. It can be seen as a meth- risks?
od of guarding against adversarial examples. These meth-
ods improve DNN′s resistance to adversarial examples. 2.1 Threat model
In addition to building safe and reliable DNN models,
studying adversarial examples and their countermeasures 2.1.1 Adversary′s goal
is also beneficial for us to understand the nature of DNNs 1) Poisoning attack versus evasion attack
and consequently improve them. For example, adversari- Poisoning attacks refer to the attacking algorithms
al perturbations are perceptually indistinguishable to hu- that allow an attacker to insert/modify several fake
man eyes but can evade DNN′s detection. This suggests samples into the training database of a DNN algorithm.
that the DNN′s predictive approach does not align with These fake samples can cause failures of the trained clas-
human reasoning. There are works[9, 18] to explain and in- sifier. They can result in the poor accuracy[19], or wrong
terpret the existence of adversarial examples of DNNs, prediction on some given test samples[10]. This type of at-
which can help us gain more insight into DNN models. tacks frequently appears in the situation where the ad-
In this review, we aim to summarize and discuss the versary has access to the training database. For example,
main studies dealing with adversarial examples and their web-based repositories and “honeypots” often collect mal-
countermeasures. We provide a systematic and compre- ware examples for training, which provides an opportun-
hensive review on the start-of-the-art algorithms from im- ity for adversaries to poison the data.
ages, graphs and text domain, which gives an overview of In evasion attacks, the classifiers are fixed and usu-
the main techniques and contributions to adversarial at- ally have good performance on benign testing samples.
tacks and defenses. The adversaries do not have authority to change the clas-
The main structure of this survey is as follows: In Sec- sifier or its parameters, but they craft some fake samples
tion 2, we introduce some important definitions and con- that the classifier cannot recognize. In other words, the
cepts which are frequently used in adversarial attacks and adversaries generate some fraudulent examples to evade
their defenses. It also gives a basic taxonomy of the types detection by the classifier. For example, in autonomous
of attacks and defenses. In Sections 3 and 4, we discuss driving vehicles, sticking a few pieces of tapes on the stop
main attack and defense techniques in the image classific- signs can confuse the vehicle′s road sign recognizer[20].
ation scenario. We use Section 5 to briefly introduce some 2) Targeted attack versus non-targeted attack
studies which try to explain the phenomenon of ad- In targeted attack, when the victim sample (x, y) is
versarial examples. Sections 6 and 7 review the studies in given, where x is feature vector and y ∈ Y is the ground
graph and text data, respectively. truth label of x, the adversary aims to induce the classifi-
er to give a specific label t ∈ Y to the perturbed sample
2 Definitions and notations x′ . For example, a fraudster is likely to attack a financial
company′s credit evaluation model to disguise himself as
In this section, we give a brief introduction to the key a highly credible client of this company.
components of model attacks and defenses. We hope that If there is no specified target label t for the victim
our explanations can help our audience to understand the sample x, the attack is called non-targeted attack. The
main components of the related works on adversarial at- adversary only wants the classifier to predict incorrectly.
tacks and their countermeasures. By answering the fol- 2.1.2 Adversary′s knowledge
lowing questions, we define the main terminology: 1) White-box attack
1) Adversary's goal (Section 2.1.1) In a white-box setting, the adversary has access to all
What is the goal or purpose of the attacker? Does he the information of the target neural network, including its
want to misguide the classifier′s decision on one sample, architecture, parameters, gradients, etc. The adversary
or influence the overall performance of the classifier? can make full use of the network information to carefully
2) Adversary's knowledge (Section 2.1.2) craft adversarial examples. White-box attacks have been
What information is available to the attacker? Does extensively studied because the disclosure of model archi-
 
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 153 

tecture and parameters helps people understand the ers of artificial neurons. In each layer, the neurons take
weakness of DNN models clearly and it can be analyzed the input from previous layers, process it with the activa-
mathematically. As stated by Tramer et al.[21], security tion function and send it to the next layer; the input of
against white-box attacks is the property that we desire first layer is sample x, and the (softmax) output of last
machine learning (ML) models to have. layer is the score F (x). An m-layer fully connected neur-
2) Black-box attack al network can be formed as
In a black-box attack setting, the inner configuration
of DNN models is unavailable to adversaries. Adversaries z (0) = x; z (l+1) = σ(W l z l + bl ).
can only feed the input data and query the outputs of the
models. They usually attack the models by keeping feed- One thing to note is that, the back-propagation al-
ing samples to the box and observing the output to ex- ∂F (x; θ)
gorithm helps calculate , which makes gradient
ploit the model′s input-output relationship, and identity ∂θ
descent effective in learning parameters. In adversarial
its weakness. Compared to white-box attacks, black-box
learning, back-propagation also facilitates the calculation
attacks are more practical in applications because model ∂F (x; θ)
designers usually do not open source their model para- of the term: , representing the output′s response
∂x
meters for proprietary reasons. to a change in input. This term is widely used in the
3) Semi-white (gray) box attack studies to craft adversarial examples.
In a semi-white box or gray box attack setting, the at- b) Convolutional neural networks
tacker trains a generative model for producing adversari- In computer vision tasks, convolutional neural net-
al examples in a white-box setting. Once the generative works[1] is one of the most widely used models. CNN
model is trained, the attacker does not need victim mod- models aggregate the local features from the image to
el anymore, and can craft adversarial examples in a learn the representations of image objects. CNN models
black-box setting. can be viewed as a sparse-version of fully connected neur-
2.1.3 Victim models al networks: Most of the weights between layers are zero.
We briefly summarize the machine learning models Its training algorithm or gradients calculation can also be
which are susceptible to adversarial examples, and some inherited from fully connected neural networks.
popular deep learning architectures used in image, graph c) Graph convolutional networks (GCN)
and text data domains. In our review, we mainly discuss The work of graph convolutional networks introduced
studies of adversarial examples for deep neural networks. by Kipf and Welling[7] became a popular node classifica-
1) Conventional machine learning models tion model for graph data. The idea of graph convolution-
For conventional machine learning tools, there is a al networks is similar to CNN: It aggregates the informa-
long history of studying safety issues. Biggio et al.[22] at- tion from neighbor nodes to learn representations for each
tack support vector machine (SVM) classifiers and fully- node v , and outputs the score F (v, X) for prediction:
connected shallow neural networks for the MNIST data-
set. Barreno et al.[23] examine the security of SpamBayes, H (0) = X; H (l+1) = σ(ÂH (l) W l )
a Bayesian method based spam detection software. In
[24], the security of Naive Bayes classifiers is checked. where X denotes the input graph′s feature matrix, and Â
Many of these ideas and strategies have been adopted in depends on graph degree matrix and adjacency matrix.
the study of adversarial attacks in deep neural networks. d) Recurrent neural networks (RNN)
2) Deep neural networks Recurrent neural networks are very useful for tack-
Different from traditional machine learning tech- ling sequential data. As a result, they are widely used in
niques which require domain knowledge and manual fea- natural language processing. The RNN models, especially
ture engineering, DNNs are end-to-end learning al- long short term memory based models (LSTM)[4], are able
gorithms. The models use raw data directly as input to to store the previous time information in memory, and
the model, and learn objects' underlying structures and exploit useful information from previous sequence for
attributes. The end-to-end architecture of DNNs makes it next-step prediction.
easy for adversaries to exploit their weakness, and gener-
ate high-quality deceptive inputs (adversarial examples). 2.2 Security evaluation
Moreover, because of the implicit nature of DNNs, some
of their properties are still not well understood or inter- We also need to evaluate the model′s resistance to ad-
pretable. Therefore, studying the security issues of DNN versarial examples. “Robustness” and “adversarial risk”
models is necessary. Next, we will briefly introduce some are two terms used to describe this resistance of DNN
popular victim deep learning models which are used as models to one single sample, and the total population, re-
“benchmark” models in attack/defense studies. spectively.
a) Fully-connected neural networks (FC) 2.2.1 Robustness
Fully-connected neural networks are composed of lay- Definition 1. Minimal perturbation: Given the classi-
 
 154 International Journal of Automation and Computing 17(2), April 2020

fier F and data (x, y), the adversarial perturbation has low the distribution D . Thus, the studies on adversarial
the least norm (the most unnoticeable perturbation): examples are different from these on model generaliza-
tion. Moreover, a number of studies reported the relation
δmin = arg min ||δ|| s.t. F (x + δ) ̸= y. between these two properties[25−28]. From our clarification,
δ
we hope that our audience get the difference and relation
Here, || · || usually refers to lp norm. between risk and adversarial risk, and the importance of
studying adversarial countermeasures.
Definition 2. Robustness: The norm of minimal per-
turbation:
2.3 Notations
r(x, F ) = ||δmin ||.
With the aforementioned definitions, Table 1 lists the
Definition 3. Global robustness: The expectation of notations which will be used in the subsequent sections.
robustness over the whole population D :  

Table 1    Notations
ρ(F ) = E r(x, F ).
x∼D Notations Description

x Victim data sample
The minimal perturbation can find the adversarial ex-
′ Perturbed data sample
ample which is most similar to x under the model F . x
Therefore, the larger r(x, F ) or ρ(F ) is, the adversary δ Perturbation
needs to sacrifice more similarity to generate adversarial
Bϵ (x) lp -distance neighbor ball around x  with radius ϵ
samples, implying that the classifier F is more robust or
safe. D Natural data distribution

2.2.2 Adversarial risk (loss) || · ||p lp  norm


Definition 4. Most-adversarial example: Given the
y Sample x′s ground truth label
classifier F and data x, the sample xadv with the largest
loss value in x′ s ϵ-neighbor ball: t Target label t

Set of possible labels. Usually we assume there are m
Y
xadv = arg max L(x′ , F ) s.t. ||x′ − x|| ≤ ϵ. labels
x′
C Classifier whose output is a label: C(x) = y
Definition 5. Adversarial loss: The loss value for the F DNN model which outputs a score vector: F (x) ∈ [0, 1]m
most-adversarial example:
Logits: last layer outputs before softmax:
Z F (x) = sof tmax(Z(x))

Ladv (x) = L(xadv ) = max

L(θ, x , y). σ
||x −x||<ϵ Activation function used in neural networks

θ Parameters of the model F
Definition 6. Global adversarial loss: The expecta-
tion of the loss value on xadv over the data distribution L Loss function for training. We simplify L(F (x), y) in the
form L(θ, x, y).
D:  

Radv (F ) = E max L(θ, x′ , y). (1) 3 Generating adversarial examples


x∼D ||x′ −x||<ϵ

In this section, we introduce main methods for gener-


The most-adversarial example is the point where the
ating adversarial examples in image classification domain.
model is most likely to be fooled in the neighborhood of
x. A lower loss value Ladv indicates a more robust model Studying adversarial examples in the image domain is
considered to be essential because: 1) Perceptual similar-
F.
2.2.3 Adversarial risk versus risk ity between fake and benign images is intuitive to observ-
The definition of adversarial risk is drawn from the ers, and 2) image data and image classifiers have simpler
definition of classifier risk (empirical risk): structure than other domains, like graph or audio. Thus,
many studies concentrate on attacking image classifiers as
R(F ) = E L(θ, x, y). a standard case. In this section, we assume the image
x∼D
classifiers refer to fully connected neural networks and
Risk studies a classifier′s performance on samples from convolutional neural networks[1]. The most common data-
natural distribution D . Whereas, adversarial risk from (1) sets used in these studies include 1) handwritten letter
studies a classifier′s performance on adversarial example images dataset MNIST, 2) CIFAR10 object dataset and
x′ . It is important to note that x′ may not necessarily fol- 3) ImageNet[29]. Next, we go through some main methods
 
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 155 

used to generate adversarial image examples for evasion versarial example x′ , with the objective:
attack (white-box, black-box, grey-box, physical-world at-
tack), and poisoning attack settings. Note that we also min ||x − x′ ||22
summarize all the attack methods in Table A in Appendix A. (2)
s.t. C(x′ ) = t and x′ ∈ [0, 1]m .

3.1 White-box attacks Szegedy et al. approximately solve this problem by in-
troducing the loss function, which results the following
Generally, in a white-box attack setting, when the objective:
classifier C (model F ) and the victim sample (x, y) are
given to the attacker, his goal is to synthesize a fake im- min c||x − x′ ||22 + L(θ, x′ , t), s.t. x′ ∈ [0, 1]m .
age x′ perceptually similar to original image x but that
can mislead the classifier C to give wrong prediction res- In the optimization objective of this problem, the first
ults. It can be formulated as term imposes the similarity between x′ and x. The second
term encourages the algorithm to find x′ which has a
find x′ satisfying ||x′ − x|| ≤ ϵ, such that C(x′ ) = t ̸= y small loss value to label t , so the classifier C will be very
likely to predict x′ as t . By continuously changing the
where || · || measures the dissimilarity between x′ and x, value of constant c, they can find an x′ which has minim-
which is usually lp norm. Next, we will go through main um distance to x, and at the same time fool the classifier
methods to realize this formulation. C . To solve this problem, they implement the L-BFGS[30]
3.1.1 Biggio′s attack algorithm.
Biggio et al.[22] firstly generates adversarial examples 3.1.3 Fast gradient sign method (FGSM)
on MNIST data set targeting conventional machine learn- Goodfellow et al.[9] introduced an one-step method to
ing classifiers like SVMs and 3-layer fully-connected neur- fast generate adversarial examples. Their formulation is
al networks.
It optimizes the discriminant function to mislead the x′ = x + ϵsgn(∇x L(θ, x, y)), non-target
classifier. For example, on MNIST dataset, for a linear
x′ = x − ϵsgn(∇x L(θ, x, t)), target on t.
SVM classifier, its discriminant function g(x) = ⟨w, x⟩+ b,
will mark a sample x with positive value g(x) > 0 to be in For targeted attack setting, this formulation can be
class “3”, and x with g(x) ≤ 0 to be in class “not 3”. An seen as a one-step of gradient descent to solve the problem:
example of this attack is in Fig. 2.
min L(θ, x′ , t)
(3)
 

5 5 s.t. ||x′ − x||∞ ≤ ϵ and x′ ∈ [0, 1]m .


10 10
The objective function in (3) searches the point which
15 15
has the minimum loss value to label t in x′s ϵ-neighbor
20 20 ball, which is the location where model F is most likely
25 25 to predict it to the target class t . In this way, the one-
5 10 15 20 25 5 10 15 20 25 step generated sample x′ is also likely to fool the model.
Predicted as “3” Predicted as “not 3” An example of FGSM-generated example on ImageNet is
  shown in Fig. 1.
Fig. 2     Biggio′ s  attack  on  SVM  classifier  for  letter  recognition Compared to the iterative attack in Section 3.1.2,
(Image credit: Biggio et al.[22])
 

FGSM is fast in generating adversarial examples, be-


cause it only involves calculating one back-propagation
Suppose we have a sample x which is correctly classi-
step. Thus, FGSM addresses the demands of tasks that
fied to be “3”. For this model, Biggio′s attack crafts a need to generate a large amount of adversarial examples.
new example x′ to minimize the discriminant value g(x′ ) For example, adversarial training[31], uses FGSM to pro-
while keeping ||x′ − x||1 small. If g(x′ ) is negative, the duce adversarial samples for all samples in training set.
sample is classified as “not 3”, but x′ is still close to x, so 3.1.4 DeepFool
the classifier is fooled. The studies about adversarial ex- In DeepFool[32], the authors study a classifier F ′s de-
amples for conventional machine learning models[19, 22, 24], cision boundary around data point x. They try to find a
inspired studies on safety issues of deep learning models. path such that x can go beyond the decision boundary, as
3.1.2 Szegedy′s limited-memory BFGS (L-BFGS) shown in Fig. 3, so that the classifier will give a different
attack prediction for x. For example, to attack x0 (true label is
The work of Szegedy et al.[8] is the first to attack deep digit 4) to digit class 3, the decision boundary is de-
neural network image classifiers. They formulate their op- scribed as F3 = {z : F (x)4 − F (x)3 = 0}. We denote
timization problem as a search for minimal distorted ad- f (x) = F (x)4 − F (x)3 for short. In each attacking step, it
 
 156 International Journal of Automation and Computing 17(2), April 2020

linearizes the decision boundary hyperplane using Taylor x0 = x; xt+1 = Clipx,ϵ (xt + αsgn(∇x L(θ, xt , y))).
expansion F3′ = {x : f (x) ≈ f (x0 )+⟨∇x f (x0 ) · (x−x0 )⟩ = 0},
and calculates the orthogonal vector ω from x0 to plane Here, Clip denotes the function to project its argu-
F3′ . This vector ω can be the perturbation that makes x0 ment to the surface of x′s ϵ -neighbor ball Bϵ (x) :
go beyond the decision boundary F3. By moving along {x′ : ||x′ − x||∞ ≤ ϵ}. The step size α is usually set to be
the vector ω, the algorithm is able to find the adversarial relatively small (e.g., 1 unit of pixel change for each
example x′0 that is classified to class 3. pixel), and step numbers guarantee that the perturba-
ϵ
tion can reach the border (e.g., step = + 10 ). This iter-
 

α
1
ative attacking method is also known as projected gradi-
ent method (PGD) if the algorithm is added by a ran-
dom initialization on x, used in work [14].
3
This BIM (or PGD) attack heuristically searches the
samples x′ which have the largest loss value in the l∞
(x0′ , 3) ball around the original sample x. This kind of adversari-
al examples are called “most-adversarial” examples: They
are the sample points which are most aggressive and
(x0, 4) most-likely to fool the classifiers, when the perturbation
2
intensity (its lp norm) is limited. Finding these adversari-
al examples is helpful to find the weaknesses of deep
learning models.
  3.1.7 Carlini & Wagner′s attack
Fig. 3     Decision  boundaries:  the  hyperplane  F∞  (F∈  or  F∋) Carlini and Wagner′s attack[34] counterattacks the de-
separates the data points belonging to class 4 and class 1 (class 2
or  3).  The  sample  x0  crosses  the  decision  boundary  F∋,  so  the fense strategy[12] which were shown to be successful
perturbed data x′0 is classified as class 3. (Image credit: Moosavi- against FGSM and L-BFGS attacks. C&W′s attack aims
Dezfooli et al.[32])
 
to solve the same problem as defined in L-BFGS attack
(Section 3.1.2), namely trying to find the minimally-dis-
The experiments of DeepFool[32] shows that for com- torted perturbation (2).
mon DNN image classifiers, almost all test samples are The authors solve the problem (2) by instead solving:
very close to their decision boundary. For a well-trained
LeNet classifier on MNIST dataset, over 90% of test min ||x − x′ ||22 + c · f (x′ , t), s.t. x′ ∈ [0, 1]m
samples can be attacked by small perturbations whose l∞
norm is below 0.1 where the total range is [0, 1]. This where f is defined as f (x′ , t) = (maxi̸=t Z(x′ )i − Z(x′ )t )+.
suggests that the DNN classifiers are not robust to small Minimizing f (x′ , t) encourages the algorithm to find an x′
perturbations. that has larger score for class t than any other label, so
3.1.5 Jacobian-based saliency map attack that the classifier will predict x′ as class t . Next, applying
Jacobian-based saliency map attack (JSMA)[33] intro- a line search on constant c, we can find the x′ that has
duced a method based on calculating the Jacobian mat- the least distance to x.
rix of the score function F . It can be viewed as a greedy The function f (x, y) can also be viewed as a loss func-
tion for data (x, y): It penalizes the situation where there
attack algorithm by iteratively manipulating the pixel
are some labels i with scores Z(x)i larger than Z(x)y. It
which is the most influential to the model output.
can also be called margin loss function.
The authors used the Jacobian matrix JF(x) =
{ } The only difference between this formulation and the
∂F (x) ∂Fj (x)
= to model F (x)′s change in re- one in L-BFGS attack (Section 3.1.2) is that C&W′s at-
∂x ∂xi i×j tack uses margin loss f (x, t) instead of cross entropy loss
sponse to the change of its input x. For a targeted attack
L(x, t). The benefit of using margin loss is that when
setting where the adversary aims to craft an x′ that is
C(x′ ) = t, the margin loss value f (x′ , t) = 0, the algor-
classified to the target class t , they repeatedly search and
ithm will directly minimize the distance from x′ to x.
manipulate pixel xi whose increase (decrease) will cause
∑ This procedure is more efficient for finding the minimally
Ft (x) to increase or decrease j̸=t Fj (x). As a result, for distorted adversarial example.
x, the model will give it the largest score to label t . The authors claim their attack is one of the strongest
3.1.6 Basic iterative method (BIM)/Projected attacks, breaking many defense strategies which were
gradient descent (PGD) attack shown to be successful. Thus, their attacking method can
The basic iterative method was first introduced by be used as a benchmark to examine the safety of DNN
Kurakin et al.[15, 31] It is an iterative version of the one- classifiers or the quality of other adversarial examples.
step attack FGSM in Section 3.1.3. In a non-targeted set- 3.1.8 Ground truth attack
ting, it gives an iterative formulation to craft x′ : Attacks and defenses keep improving to defeat each
 
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 157 

other. In order to end this stalemate, the work of Carlini samples in the ILSVRC 2012[43] dataset under a ResNet-
et al.[35] tries to find the “provable strongest attack”. It 152[2] classifier.
can be seen as a method to find the theoretical minim- The existence of “universal” adversarial examples re-
ally-distorted adversarial examples. veals a DNN classifier′s inherent weakness on all of the
This attack is based on Reluplex[36], an algorithm for input samples. As claimed in work [42], it may suggest
verifying the properties of neural networks. It encodes the the property of geometric correlation among the high-di-
model parameters F and data (x, y) as the subjects of a mensional decision boundary of classifiers.
linear-like programming system, and then solve the sys- 3.1.11 Spatially transformed attack
tem to check whether there exists an eligible sample x′ in Traditional adversarial attack algorithms directly
x′s neighbor Bϵ (x) that can fool the model. If we keep re- modify the pixel value of an image, which changes the
ducing the radius ϵ of search region Bϵ (x) until the sys- image′s color intensity. Spatial attack[44] devises another
tem determines that there does not exist such an x′ that method, called a spatially transformed attack. They per-
can fool the model, the last found adversarial example is turb the image by doing slight spatial transformation:
called the ground truth adversarial example, because it They translate, rotate and distort the local image fea-
has been proved to have least dissimilarity with x. tures slightly. The perturbation is small enough to evade
The ground-truth attack is the first work to seriously human inspection but can fool the classifiers. One ex-
calculate the exact robustness (minimal perturbation) of ample is in Fig. 4.
classifiers. However, this method involves using a satis-
0 0
 

fiability modulo theories (SMT) solver (a complex al-


5 5
gorithm to check the satisfiability of a series of theories),
which will make it slow and not scalable to large net- 10 10

works. More recent works[37, 38], have improved the effi- 15 15


ciency of ground-truth attack. 20 20
3.1.9 Other lp attacks 25 25
Previous studies are mostly focused on l2 or l∞ norm- 0 5 10 15 20 25 0 5 10 15 20 25
constrained perturbations. However, there are other pa- Classified as “5” Classified as “3”
 
pers which consider other types of lp attacks.
Fig. 4     Top  part  of  digit  “ 5”  is  perturbed  to  be  “ thicker” .  For
1) One-pixel attack[39] studies similar problem as in the image which was correctly classified as “5”, after distortion is
Section 3.1.2, but constrains the perturbation′s l0 norm. now classified as “3”.
 

Constraining l0 norm of the perturbation x′ − x will lim-


it the number of pixels that are allowed to be changed. 3.1.12 Unrestricted adversarial examples
Their work shows that: On dataset CIFAR10, for a well- Previous attack methods only consider adding un-
trained CNN classifier (e.g., VGG16, which has 85.5% ac- noticeable perturbations into images. However, the work
curacy on test data), most of the testing samples (63.5%) [45] devised a method to generate unrestricted adversari-
can be attacked by changing the value of only one pixel al examples. These samples do not necessarily look ex-
in a non-targeted setting. This also demonstrates the poor actly the same as the victim samples, but are still legitim-
robustness of deep learning models. ate samples for human eyes and can fool the classifier.
2) EAD: Elastic-net attack[40] also studies a similar Previous successful defense strategies that target perturb-
problem as in Section 3.1.2, but constrains the perturba- ation-based attacks fail to recognize them.
tions l1 and l2 norm together. As shown in their experi- In order to attack given classifier C , Odena et al.[46]
mental work[41], some strong defense models that aim to pretrained an auxiliary classifier generative adversarial
reject l∞ and l2 norm attacks[14] are still vulnerable to the network (AC-GAN), so they can generate one legitimate
l1-based Elastic-net attack. sample x from a noise vector z0 from class y . Then, to
3.1.10 Universal attack craft an adversarial example, they will find a noise vec-
Previous methods only consider one specific targeted tor z near z0, but require that the output of AC-GAN
victim sample x. However, the work [42] devises an al- generator G(z) be wrongly classified by victim model C .
gorithm that successfully mislead a classifier′s decision on Because z is near z0 in latent space of the AC-GAN, its
almost all testing images. They try to find a perturba- output should belong to the same class y . In this way, the
tion δ satisfying: generated sample G(z) is different from x, misleading clas-
1)  ||δ||p ≤ ϵ. sifier F , but it is still a legitimate sample.
2)   P (C(x + δ) ̸= C(x)) ≤ 1 − σ.
x∼D(x)
This formulation aims to find a perturbation δ such 3.2 Physical world attack
that the classifier gives wrong decisions on most of the
samples. In their experiments, for example, they success- All the previously introduced attack methods are ap-
fully find a perturbation that can attack 85.4% of the test plied digitally, where the adversary supplies input im-
 
 158 International Journal of Automation and Computing 17(2), April 2020

ages directly to the machine learning model. However, the region to perturb (l1 attacks render sparse perturba-
this is not always the case for some scenarios, like those tion, which helps to find attack location). These regions
that use cameras, microphones or other sensors to re- will later be the location of stickers. 2) Concentrating on
ceive the signals as input. In this case, can we still at- the regions found in step 1, use an l2-norm based attack
tack these systems by generating physical-world ad- to generate the color for the stickers. 3) Print out the
versarial objects? Recent works show such attacks do ex- perturbation found in Steps 1 and 2, and stick them on
ist. For example, the work [20] attached stickers to road road sign. The perturbed stop sign can confuse an
signs that can severely threaten autonomous car′s sign re- autonomous vehicle from any distance and viewpoint.
cognizer. These kinds of adversarial objects are more de- 3.2.3 Athalye′s 3D adversarial object
structive for deep learning models because they can dir- In the work [47], authors report the first work which
ectly challenge many practical applications of DNN, such successfully crafted physical 3D adversarial objects. As
as face recognition, autonomous vehicle, etc. shown in Fig. 6, the authors use 3D-printing to manufac-
3.2.1 Exploring adversarial examples in physical ture an “adversarial” turtle. To achieve their goal, they
world implement a 3D rendering technique. Given a textured
In the work [15], the authors explore the feasibility of 3D object, they first optimize the object′s texture such
crafting physical adversarial objects, by checking wheth- that the rendering images are adversarial from any view-
er the generated adversarial images (FGSM, BIM) are point. In this process, they also ensure that the perturba-
“robust” under natural transformation (such as changing tion remains adversarial under different environments:
viewpoint, lighting, etc). Here, “robust” means the craf- camera distance, lighting conditions, rotation and back-
ted images remain adversarial after the transformation. ground. After finding the perturbation on 3D rendering,
To apply the transformation, they print out the crafted they print an instance of the 3D object.
images, and let test subjects use cellphones to take pho-  

tos of these printouts. In this process, the shooting angle


or lighting environment are not constrained, so the ac-
quired photos are transformed samples from previously
generated adversarial examples. The experimental results
demonstrate that after transformation, a large portion of
these adversarial examples, especially those generated by
FGSM, remain adversarial to the classifier. These results
suggest the possibility of physical adversarial objects Classified as turtle Classified as rifle Classified as other
which can fool the sensor under different environments.  
3.2.2 Eykholt′s attack on road signs Fig. 6     Image  classifier  fails  to  correctly  recognize  the
adversarial  object,  but  the  original  object  can  be  correctly
The work [20], shown in Fig. 5, crafts physical ad-
predicted with 100% accuracy (Image credit: Athalye et al.[47])
versarial objects, by “contaminating” road signs to mis-
 

lead road sign recognizers. They achieve the attack by 3.3 Black-box attacks
putting stickers on the stop sign in the desired positions.
The author′s approach consist of: 1) Implement l1- 3.3.1 Substitute model
norm based attack (those attacks that constrain The work [48] was the first to introduce an effective
||x′ − x||1) on digital images of road signs to roughly find algorithm to attack DNN classifiers, under the condition
 
that the adversary has no access to the classifier′s para-
meters or training set (black-box). An adversary can only
feed input x to obtain the output label y from the classifi-
er. Additionally, the adversary may have only partial
knowledge about: 1) the classifier′s data domain (e.g.,
handwritten digits, photographs, human faces) and 2) the
architecture of the classifier (e.g., CNN, RNN).
The authors in the work [48] exploits the “transferab-
ility” (Section 5.3) property of adversarial examples: a
sample x′ can attack F1, it is also likely to attack F2 ,
which has similar structure to F1. Thus, the authors in-
troduce a method to train a substitute model F ′ to imit-
ate the target victim classifier F , and then craft the ad-
 
Fig. 5     Attacker puts some stickers on a road sign to confuse an
versarial example by attacking substitute model F ′ . The
autonomous  vehicle′ s  road  sign  recognizer  from  any  viewpoint main steps are below:
(Image credit: Eykholt et al.[20])
 
1) Synthesize substitute training dataset
 
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 159 

Make a “replica” training set. For example, to attack in practical applications. There are some studies on im-
a victim classifier for hand-written digits recognition task, proving the efficiency of generating black-box adversarial
make an initial substitute training set by: a) requiring examples via a limited number of queries. For example,
samples from test set; or b) handcrafting samples. the authors in work [51] introduced a more efficient way
2) Training the substitute model to estimate the gradient information from model outputs.
Feed the substitute training dataset X into the vic- They use natural evolutional strategies[52], which sample
tim classifier to obtain their labels Y . Choose one substi- the model′s output based on the queries around x, and es-
tute DNN model to train on (X, Y ) to get F ′ . Based on timate the expectation of gradient of F on x. This pro-
the attacker′s knowledge, the chosen DNN should have cedure requires fewer queries to the model. Moreover, the
similar structure to the victim model. authors in work [53] apply a genetic algorithm to search
3) Dataset augmentation the neighbors of benign image for adversarial examples.
Augment the dataset (X, Y ) and retrain the substi-
tute model F ′ iteratively. This procedure helps to in- 3.4 Semi-white (grey) box attack
crease the diversity of the replica training set and im-
prove the accuracy of substitute model F ′ . 3.4.1 Using generative adversarial network (GAN)
4) Attacking the substitute model to generate adversarial examples
Utilize the previously introduced attack methods, such The work [54] devised a semi-white box attack frame-
as FGSM to attack the model F ′ . The generated ad- work. It first trained a GAN[55], targeting the model of in-
versarial examples are also very likely to mislead the tar- terest. The attacker can then craft adversarial examples
get model F , by the property of “transferability”. directly from the generative network.
What kind of attack algorithm should we choose to The authors believe the advantage of the GAN-based
attack substitute model? The success of substitute model attack is that it accelerates the process of producing ad-
black-box attack is based on the “transferability” prop- versarial examples, and makes more natural and more un-
erty of adversarial examples. Thus, during black-box at- detectable samples. Later, Deb′s grey box attack[56] uses
tack, we choose attacks that have high transferability, GAN to generate adversarial faces to evade face recogni-
like FGSM, PGD and momentum-based iterative tion software. Their crafted face images appear to be
attacks[49]. more natural and have barely distinguishable difference
3.3.2 ZOO: Zeroth order optimization based black- from target face images.
box attack
Different from the work in Section 3.3.1 where an ad- 3.5 Poisoning attacks
versary can only obtain the label information from the
classifier, the work [50] assume the attacker has access to The attacks we have discussed so far are evasion at-
the prediction confidence (sscore) from the victim classifi- tacks, which are launched after the classification model is
er′s output. In this case, there is no need to build the trained. Some works instead craft adversarial examples
substitute training set and substitute model. Chen et al. before training. These adversarial examples are inserted
give an algorithm to “scrape” the gradient information into the training set in order to undermine the overall ac-
around victim sample x by observing the changes in the curacy of the learned classifier, or influence its prediction
prediction confidence F (x) as the pixel values of x are on certain test examples. This process is called a poison-
ing attack.
tuned.
Usually, the adversary in a poisoning attack setting
Equation (4) shows for each index i of sample x, we
add (or subtract) xi by h . If h is small enough, we can has knowledge about the architecture of the model which
scrape the gradient information from the output of F (·) is later trained on the poisoned dataset. Poisoning at-
tacks frequently applied to attack graph neural network,
by
because of the GNN′s specific transductive learning pro-
∂F (x) F (x + hei ) − F (x − hei ) cedure. Here, we introduce studies that craft image pois-
≈ . (4) oning attacks.
∂xi 2h
3.5.1 Biggio′s poisoning attack on SVM
Utilizing the approximate gradient, we can apply the The work [19] introduced a method to poison the
attack formulations introduced in Sections 3.1.3 and training set in order to reduce SVM model′s accuracy. In
3.1.7. The attack success rate of ZOO is higher than sub- their setting, they try to figure out a poison sample xc
stitute model (Section 3.3.1) because it can utilize the in- which, when inserted into the training set, will result in
formation of prediction confidence, instead of solely the the learned SVM model Fxc having a large total loss on
predicted labels. the whole validation set. They achieve this by using in-
3.3.3 Query-efficient black-box attack cremental learning technique for SVMs[57], which can
Previously introduced black-box attacks require lots of model the influence of training sample on the learned
input queries to the classifier, which may be prohibitive SVM model.
 
 160 International Journal of Automation and Computing 17(2), April 2020

A poisoning attack based on procedure above is quite 4.1 Gradient masking/Obfuscation


successful for SVM models. However, for deep learning
models, it is not easy to explicitly figure out the influ- Gradient masking/Obfuscation refers to the strategy
ence of training samples on the trained model. Below we where a defender deliberately hides the gradient informa-
introduce some approaches for applying poisoning at- tion of the model, in order to confuse the adversaries,
tacks on DNN models. since most attack algorithms are based on the classifier′s
3.5.2 Koh′s model explanation gradient information.
Koh and Liang′s explanation study[58] introduce a 4.1.1 Defensive distillation
method to interpret deep neural networks: How would “Distillation”, first introduced by Hinton et al.[60], is a
the model′s predictions change if a training sample were training technique to reduce the size of DNN architec-
modified? Their model can explicitly quantify the change tures. It fulfills its goal by training a smaller-size DNN
in the final loss without retraining the model when only model on the logits (outputs of the last layer before soft-
one training sample is modified. This work can be natur- max).
ally adopted to poisoning attacks by finding those train- The work [12] reformulate the procedure of distilla-
ing samples that have large influence on model’s predic- tion to train a DNN model that can resist adversarial ex-
tion. amples, such as FGSM, Szegedy′s L-BFGS attack or
3.5.3 Poison frogs
DeepFool. They design their training process as:
“Poison frogs”[59] introduced a method to insert an ad-
1) Train a network F on the given training set (X, Y )
versarial image with true label to the training set, in or-
by setting the temperature1 of the softmax to T .
der to cause the trained model to wrongly classify a tar-
2) Compute the scores (after softmax) given by F (X),
get test sample. In their work, given a target test sample
again evaluating the scores at temperature T .
xt, whose true label is yt, the attacker first uses a base
3) Train another network FT′ using softmax at temper-
sample xb from class yb. Then, it solves the objective to
ature T on the dataset with soft labels (X, F (X)). We
find x′ :
refer the model FT′ as the distilled model.
4) During prediction on test data Xtest (or adversari-
x′ = arg min ||Z(x) − Z(xt )||22 + β||x − xb ||22 .
x al examples), use the distilled network FT′ but use soft-
max at temperature 1, which is denoted as F1′ .
After inserting the poison sample x′ into training set, Carlini and Wagner[34] explain why this algorithm
the new model trained on Xtrain + {x}′ will classify x′ as works: When we train a distilled network FT′ at temperat-
class yb, because of the small distance between x′ and xb . ure T and test it at temperature 1, we effectively cause
Using a new trained model to predict xt, the objective of the inputs to the softmax to become larger by a factor of
x′ forces the score vector of xt and x′ to be close. Thus, T. Let us say T = 100, the logits Z(·) for sample x and its
x′ and xt will have the same prediction outcome. In this neighbor points x′ will be 100 times larger, which will res-
way, the new trained model will predict the target sample ult the softmax function F1 (·) = softmax(Z(·), 1) output-
xt as class yb.
ting a score vector like (ϵ, ϵ, · · · , 1 − (m − 1)ϵ, ϵ, · · · , ϵ),
where the target output class has a score extremely close
4 Countermeasures against adversarial
to 1, and all other classes have scores close to 0. In prac-
examples tice, the value of ϵ is so small that its 32-bit floating-
In order to protect the security of deep learning mod- point value for computer is rounded to 0. In this way, the
els, different strategies have been considered as counter- computer cannot find the gradient of score function F1′ ,
measures against adversarial examples. There are basic- which inhibits the gradient-based attacks.
ally three main categories of these countermeasures: 4.1.2 Shattered gradients
1) Gradient masking/Obfuscation Some studies, such as [61, 62], try to protect the mod-
Since most attack algorithms are based on the gradi- el by preprocessing the input data. They add a non-
ent information of the classifier, masking or hiding the smooth or non-differentiable preprocessor g(·) and then
gradients will confound the adversaries. train a DNN model f on g(X). The trained classifier
2) Robust optimization f (g(·)) is not differentiable in term of x, causing the fail-
Re-learning a DNN classifier′s parameters can in- ure of adversarial attacks.
crease its robustness. The trained classifier will correctly For example, Thermometer encoding[61] uses a prepro-
classify the subsequently generated adversarial examples. 1Note that the softmax function at a temperature T means:
3) Adversarial examples detection xi

Study the distribution of natural/benign examples, de- eT


sof tmax(x, T )i = ∑ xj , where i = 0, 2, · · · , K − 1.
tect adversarial examples and disallow their input into eT
the classifier. j

 
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 161 

cessor to discretize an image′s pixel value xi into a l-di- 4.2 Robust optimization
mensional vector τ (xi ). (e.g., when l = 10 , τ (0.66) =
1 111 110 000 ). The vector τ (xi ) acts as a “thermometer” Robust optimization methods aim to improve the clas-
to record the pixel xi′s value. A DNN model is later sifier′s robustness (Section 2.2) by changing DNN model′s
trained on these vectors. Another work [62] studies a manner of learning. They study how to learn model para-
number of image processing tools, such as image crop- meters that can give promising predictions on potential
ping, compressing, total-variance minimization and super- adversarial examples. In this field, the works majorly fo-
resolution[63], to determine whether these techniques help cus on: 1) learning model parameters θ∗ to minimize the
to protect the model against adversarial examples. All average adversarial loss: (Section 2.2.2)
these approaches block up the smooth connection
between the model′s output and the original input θ∗ = arg min E max L(θ, x′ , y) (5)
θ∈Θ x∼D ||x′ −x||≤ϵ
samples, so the attacker cannot easily find the gradient
∂F (x)
for attacking. or 2) learning model parameters θ∗ to maximize the
∂x
4.1.3 Stochastic/Randomized gradients average minimal perturbation distance: (Section 2.2.1)
Some defense strategies try to randomize the DNN
model in order to confound the adversary. For instance, θ∗ = arg max E min ||x′ − x||. (6)
θ∈Θ x∼D C(x′ )̸=y
we train a set of classifiers s = {Ft : t = 1, 2, · · · , k} . Dur-
ing evaluation on data x, we randomly select one classifi- Typically, a robust optimization algorithm should
er from the set s and predict the label y . Because the ad- have a prior knowledge of its potential threat or poten-
versary has no idea which classifier is used by the predic- tial attack (adversarial space D ). Then, the defenders
tion model, the attack success rate will be reduced. build classifiers which are safe against this specific attack.
Some examples of this strategy include the work [64], For most of the related works[9, 14, 15], they aim to defend
who randomly drop some neurons of each layer of the against adversarial examples generated from small lp
DNN model, and the work [65], who resize the input im- (specifically l∞ and l2) norm perturbation. Even though
ages to a random size and pad zeros around the input im- there is a chance that these defenses are still vulnerable
age. to attacks from other mechanisms, (e.g., spatial
4.1.4 Exploding & vanishing gradients attack[44]), studying the security against lp attack is fun-
Both PixelDefend[66] and Defense-GAN[67] suggest us- damental and can be generalized to other attacks.
ing generative models to project a potential adversarial In this section, we concentrate on defense approaches
example onto the benign data manifold before classifying using robustness optimization against lp attacks. We cat-
them. While PixelDefend uses PixelCNN generative mod- egorize the related works into three groups: 1) regulariza-
el[68], Defense-GAN uses a GAN architecture[5]. The gen- tion methods, 2) adversarial (re)training and 3) certified
erative models can be viewed as a purifier that trans- defenses.
forms adversarial examples into benign examples. 4.2.1 Regularization methods
Both of these methods consider adding a generative Some early studies on defending against adversarial
network before the classifier DNN, which will cause the examples focus on exploiting certain properties that a ro-
final classification model be an extremely deep neural net- bust DNN should have in order to resist adversarial ex-
work. The underlying reason that these defenses succeed amples. For example, Szegedy et al.[8] suggest that a ro-
is because: The cumulative product of partial derivatives bust model should be stable when its inputs are distorted,
∂L (x) so they turn to constrain the Lipschitz constant to im-
from each layer will cause the gradient to be ex-
∂x pose this “stability” of model output. Training on these
tremely small or irregularly large, which prevents the at- regularizations can sometimes heuristically help the mod-
tacker accurately estimating the location of adversarial el be more robust.
examples. 1) Penalize layer's Lipschitz constant
4.1.5 Gradient masking/Obfuscation methods are
When Szegydy et al.[8] first claimed the vulnerability
not safe
of DNN models to adversarial examples, they suggested
In the work Carlini and Wagner′s attack[34], they show
adding regularization terms on the parameters during
the method of “Defensive Distillation” (Section 4.1.1) is
training, to force the trained model be stable. It sugges-
still vulnerable to their adversarial examples. In the study
ted constraining the Lipschitz constant Lk between any
[13], the authors devised different attacking algorithms to
two layers:
break gradient masking/obfuscation defending strategies
(Sections 4.1.2 – 4.1.4). ∀x, δ, ||hk (x; Wk ) − hk (x + δ; Wk )|| ≤ Lk ||δ||
The main weakness of the gradient masking strategy
is that: It can only “confound” the adversaries; it cannot so that the outcome of each layer will not be easily
eliminate the existence of adversarial examples. influenced by the small distortion of its input. The work
 
 162 International Journal of Automation and Computing 17(2), April 2020

Parseval networks[69] formalized this idea, by claiming will cause gradient obfuscation (Section 4.1), where there
that the model′s adversarial risk (5) is right dependent on is an extreme non-smoothness of the trained classifier F
this instability Lk: near the test sample x. Refer to Fig. 7 as an illustration of
the non-smooth property of FGSM trained classifier.
E Ladv (x) ≤ E L(x)+ Algorithm 1. Adversarial training with FGSM by
x∼D x∼D
batches
E [ max |L(F (x′ ), y) − L(F (x), y)|] ≤
x∼D ||x′ −x||≤ϵ Randomly initialize network F

K Repeat
E L(x) + λp Lk 1) Read minibatch B = {x1 , · · · , xm } from training set
x∼D
k=1
2) Generate k adversarial examples {x1adv , · · · , xkadv }
where λp is the Lipschitz constant of the loss function. for corresponding benign examples using current state of
This formula states that during the training process, the network F .
penalizing the large instability for each hidden layer can 3) Update B ′ = {x1adv , · · · , xkadv , xk+1 , · · · , xm }
help to decrease the adversarial risk of the model, and Do one training step of network F using minibatch B ′
consequently increase the robustness of model. The idea until training converged.
of constraining instability also appears in the study [70] 2) Adversarial training with PGD
for semi-supervised, and unsupervised defenses. The PGD adversarial training[14] suggests using projec-
2) Penalize layer′s partial derivative ted gradient descent attack (Section 3.1.6) for adversari-
The study [71] introduced a deep contractive network al training, instead of using single-step attacks like
algorithm to regularize the training. It was inspired by FGSM. The PGD attacks (Section 3.1.6) can be seen as a
the contractive autoencoder[72], which was introduced to heuristic method to find the “most adversarial” example:
denoise the encoded representation learning. The deep
contractive network suggests adding a penalty on the xadv = arg max L(x′ , F ) (7)
partial derivatives at each layer into the standard back- x′ ∈Bϵ (x)

propagation framework, so that the change of the input


data will not cause large change on the output of each in the l∞ ball around x: Bϵ (x). Here, the most-adversarial
layer. Thus, it becomes difficult for the classifier to give example xadv is the location where the classifier F is most
different predictions on perturbed data samples. likely to be misled. When training the DNN model on
4.2.2 Adversarial (re)training these most-adversarial examples, it actually solves the
1) Adversarial training with FGSM problem of learning model parameters θ that minimize
Goodfellow′s FGSM attack[9] were the first to suggest the adversarial loss (5). If the trained model has small
feeding generated adversarial examples into the training loss value on these most-adversarial examples, the model
process. By adding the adversarial examples with true la- is safe at everywhere in x′s neighbor ball Bϵ (x).
bel (x′ , y) into the training set, the training set will tell One thing to note is: This method trains the model
the classifier that x′ belongs to class y , so that the only on adversarial examples, instead of a mix of benign
trained model will correctly predict the label of future ad- and adversarial examples. The training algorithm is
versarial examples. shown Algorithm 2.
In the work [9], they use non-targeted FGSM The trained model under this method demonstrates
(Section 3.1.3) to generate adversarial examples x′ for the good robustness against both single-step and iterative at-
training dataset: tacks on MNIST and CIFAR10 dataset. However, this
method involves an iterative attack for all the training
x′ = x + ϵsgn(∇x L(θ, x, y)). samples. Thus, the time cost of this adversarial training
will be k (using k-step PGD) times as large as the time
By training on benign samples augmented with ad- cost for natural training, and as a consequence, it is hard
versarial examples, they increase the robustness against to scale to large datasets such as ImageNet.
adversarial examples generated by FGSM. 3) Ensemble adversarial training
The scalaed adversarial training[15] changes the train- Ensembler adversarial training[21] introduced their ad-
ing strategy of this method so that the model can be versarial training method which can protect CNN models
scaled to larger dataset such as ImageNet. They suggest against single-step attacks and also apply to large data-
using batch normalization[73] will improve the efficiency of sets such as ImageNet.
adversarial training. We give a short sketch of their al- Their main approach is to augment the classifier′s
gorithm in Algorithm 1. training set with adversarial examples crafted from other
The trained classifier has good robustness on FGSM pre-trained classifiers. For example, if we aim to train a
attacks, but is still vulnerable to iterative attacks. Later, robust classifier F , we can first pre-train classifiers F1, F2 ,
the study [21] argues that this defense is also vulnerable and F3 as references. These models have different hyper-
to single-step attacks. Adversarial training with FGSM parameters with model F . Then, for each sample x, we
 
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 163 
 

puted together in one back propagation iteration, by


sharing the same components of chain rule. Thus, the ad-
versarial training process is highly accelerated. The free
adversarial training algorithm is shown in Algorithm 3.
100 In the work [75], the authors argue that when the
model parameters are fixed, the PGD-generated ad-
10−1 versarial example is only coupled with the weights of the
first layer of DNN. It is based on solving a Pontryagin′s
maximal principle[76]. Therefore, this work [75] invents an
0.3
0.3 algorithm you only propagate once (YOPO) to reuse the
ϵ2 ϵ1 gradient of the loss to the model′s first layer output
0 0 ∂L(x + δ, θ)
during generating PGD attacks. In this way,
∂Z1 (x)
YOPO avoids a large amount of times it access the gradi-
ent and therefore reduces the computational cost.
Algorithm 2. Adversarial training with PGD
9·10−2
Randomly initialize network F
Repeat
6·10−2 1) Read minibatch B = {x1 , · · · , xm } from training set
2) Generate m adversarial examples {x1adv , · · · , xmadv }
by PGD attack using current state of the network F
2·10−2 3) Update B ′ = {x1adv , · · · , xm
adv }
0.3
0.3 Do one training step of network F using minibatch B ′
ϵ2 ϵ1 until training converged
0 0 Algorithm 3. Free adversarial training
  Randomly initialize network F
Fig. 7     Illustration of gradient masking for adversarial training
via  FGSM.  It  plots  the  loss  function  of  the  trained  classifier
Repeat
around  x  on  the  grids  of  gradient  direction  and  another 1) Read minibatch B = {x1 , · · · , xm } from training set
randomly chosen direction. We can see that the gradient poorly 2) for i = 1, · · · , m do
approximates the global loss. (Image credit: Tramer et al. [21])
 

  2.1) Update model parameter θ


    gθ ← E(x,y)∈B [∇θ L(x + δ, y, θ)]
use a single-step attack FGSM to craft adversarial ex-
     gadv ← ∇x L(x + δ, y, θ)
amples on F1, F2 and F3 to get x1adv, x2adv, x3adv. Because
     θ ← θ − αgθ
of the transferability property (Section 5.3) of the single-
  2.2) Generate adversarial examples
step attacks across different models, x1adv, x2adv, x3adv are
    δ ← δ + ϵ · sgn(gadv )
also likely to mislead the classifier F , which means these
    δ ← clip(δ, −ϵ, ϵ)
samples are a good approximation for the “most ad-
3) Update minibatch B with adversarial examples
versarial” example (7) for model F on x. Training on x+δ
these samples together will approximately minimize the until training converged
adversarial loss in (5). 4.2.3 Provable defenses
This ensemble adversarial training algorithm is more Adversarial training has been shown to be effective in
time efficient than the methods in Sections 1 and 2, since protecting models against adversarial examples. However,
it decouples the process of model training and generating this is still no formal guarantee about the safety of the
adversarial examples. The experimental results show that trained classifiers. We will never know whether there are
this method can provide robustness against single-step at- more aggressive attacks that can break those defenses, so
tacks and black-box attacks on ImageNet dataset. directly applying these adversarial training algorithms in
4) Accelerate adversarial training safety-critical tasks would be irresponsible.
While it is one of the most promising and reliable de- As we mentioned in Section 3.1.8, the ground truth
fense strategies, adversarial training with PGD attack[14] attack[35] was the first to introduce a Reluplex algorithm
is generally slow and computationally costly. to seriously verify the robustness of DNN models: When
The work [74] propose a free adversarial training al- the model F is given, the algorithm figures out the exact
gorithm which improves the efficiency by reusing the value of minimal perturbation distance r(x; F ). This is to
backward pass calculations. In this algorithm, the gradi- say, the classifier is safe against any perturbations with
∂L(x + δ, θ) norm less than this r(x; F ). If we apply Reluplex on the
ent of the loss to input: and the gradient of
∂x whole test set, we can tell what percentage of samples are
∂L(x + δ, θ)
the loss to model parameters: can be com- absolutely safe against perturbations less than norm r0.
∂θ
 
 164 International Journal of Automation and Computing 17(2), April 2020

In this way, we gain confidence and reduce the expected Recall that we introduced in Section 2.2.2, the func-
risk when building DNN models. tion maxi̸=y Zi (x′ ) − Zy (x′ ) is a type of loss function
The method of Reluplex seeks to find the exact value called margin loss.
of r(x; F ) that can verify the model F ′s robustness on x. The certificate U(x, F ) acts in this way: If
Alternately, works such as [77−79], try to find trainable U(x, F ) < 0, then adversarial loss L(x, F ) < 0. Thus, the
“certificates” C(x; F ) to verify the model robustness. For classifier always gives the largest score to the true label y
example, in the work [79], the authors calculate a certific- in the region Bϵ (x), and the model is safe in this region.
ate C(x, F ) for model F on x, which is a lower bound of To increase the model′s robustness, we should learn para-
minimal perturbation distance: C(x, F ) ≤ r(x, F ). As meters that have the smallest U value, so that more and
more data samples will have negative U values.
shown in Fig. 8, the model must be safe against any per-
The work proposed by Raghunathan et al.[77] uses in-
turbation with norm limited by C(x, F ). Moreover, these
tegration inequalities to derive the certificate and use
certificates are trainable. Training to optimize these certi-
semi-definite programming (SDP)[80] to solve the certific-
ficates will grant good robustness to the classifier. In this
ate. In contrast, the work of Wong and Kolter[78] trans-
section, we shall briefly introduce some methods to design
forms the problem (8) into a linear programming prob-
these certificates. lem and solves the problem via training an alternative
 

neural network. Both methods only consider neural net-


1
works with one hidden layer. There are also studies of
Raghunathan et al.[81] and Wong et al.[82], which im-
3 proved the efficiency and scalability of these algorithms.
Furthermore, distributional adversarial training[83]
combine adversarial training and provable defense togeth-
r (x, F) er. They train the classifier by feeding adversarial ex-
amples which are sampled from the distribution of worst-
case perturbation, and derive the certificates by studying
2
C (x, F) the Lagrangian duality of adversarial loss.

4.3 Adversarial example detection


 
Fig. 8     Derived certificate  C(§, F) is a lower bound of minimal Adversarial example detection is another main ap-
perturbation distance ρ(x, F ). Model is safe in C(§, F) ball.
 
proach to protect DNN classifier. Instead of predicting
the model′s input directly, these methods first distin-
1) Lower bound of minimal perturbation
guish whether the input is benign or adversarial. Then, if
Hein and Andriushchenko[79] derive a lower bound
it can detect the input is adversarial, the DNN classifier
C(x, F ) for the minimal perturbation distance of F on x
will refuse to predict its label. In the work [16], they sort
based on Cross-Lipschitz theorem:
the threat models into 3 categories that the detection
techniques should deal with:
Zy (x) − Zi (x)
max min{min , ϵ}. 1) A zero-knowledge adversary only has access to the
ϵ>0 i̸=y

max ||∇Zy (x′ ) − ∇Zi (x′ )||
x ∈Bϵ (x) classifier F ′s parameter, and has no knowledge of the de-
tection model D .
The detailed derivation can be found in their work of 2) A perfect-knowledge adversary is aware of the mod-
[79]. Note that the formulation of C(x, F ) only depends el F , and the detection scheme D and its parameters.
on F and x, and it is easy to calculate for a neural net- 3) A limited-knowledge adversary is aware the model
work with one hidden layer. The model F thus can be F and the detection scheme D , but does not have access
proved to be safe in the region within distance C(x, F ). to D ′s parameter. That is, this adversary does not know
Training to maximize this lower bound will make the
the model′s training set.
classifier more robust.
In all three of these threat settings, the detection tool
2) Upper bound of adversarial loss
is required to correctly classify the adversarial examples,
The works proposed by Raghunathan et al.[77] and
and have low possibility of misclassifying benign ex-
Wong and Kolter[78] aim to solve the same problem. They
amples. Next, we will go through some main methods for
try to find an upper bound U(x, F ) which is larger than
adversarial example detection.
adversarial loss Ladv (x, F ):
4.3.1 An auxiliary model to classify adversarial
examples
Ladv (x) = max {max Zi (x′ ) − Zy (x′ )}
′ x i̸=y Some works focus on designing auxiliary models that
′ aim to distinguish adversarial examples from benign ex-
s.t. x ∈ Bϵ (x). (8)
amples. The study [84] train a DNN model with
 
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 165 
 

|Y| = K + 1 labels, with an additional label for all ad-


versarial examples, so that network will assign adversari-
al examples into the K + 1 class. Similarly, the work of
Gong et al.[85] trains a binary classification model to dis-
criminate all adversarial examples apart from benign
samples, and then trains a classifier on recognized benign
samples.
The work [86] proposed a detection method to con-
struct an auxiliary neural network D which takes inputs  
from the values of hidden nodes H of the natural trained Fig. 9     Images from MNIST and CIFAR10. From left to right,
classifier. The trained detection classifier D : H → [0, 1] is the  color  depth  is  reduced  from  8-bit,  7-bit, ···,  2-bit,1-bit.
(Image credit: Xu et al.[17])
a binary classification model that distinguishes adversari-  

al examples from benign ones by the hidden layers. squeezing methods were broken by Sharma and Chen[91],
4.3.2 Using statistics to distinguish adversarial
which introduced a “stronger” adversarial attack.
examples
The authors in work [16] claim that the properties
Some early works heuristically study the differences in
which are intrinsic to adversarial examples are not very
the statistical properties of adversarial examples and be-
easy to find. They also gave several suggestions on future
nign examples. For example, in the study [87], the au- detection works:
thors found adversarial examples place a higher weight on 1) Randomization can increase the required attacking
the larger (later) principle components where the natural distortion.
images have larger weight on early principle components. 2) Defenses that directly manipulate on raw pixel val-
Thus, they can split them by principled component ana- ues are ineffective.
lysis (PCA). 3) Evaluation should be down on multiple datasets be-
In the work [84], the authors use a statistical test: sides MNIST.
maximum mean discrepancy (MMD) test[88], which is 4) Report false positive and true positive rates for de-
used to test whether two datsets are drawn from the tection.
same distribution. They use this testing tool to test 5) Evaluate using a strong attack. Simply focusing on
whether a group of data points are benign or adversarial. white-box attacks is risky.
4.3.3 Checking the prediction consistency
Other studies focus on checking the consistency of the 5 Explanations for the existence of
sample x′s prediction outcome. They usually manipulate
adversarial examples
the model parameters or the input examples themselves,
to check whether the outputs of the classifier have signi- In addition to crafting adversarial examples and de-
ficant changes. These are based on the belief that the fending them, explaining the reason behind these phe-
classifier will have stable predictions on natural examples nomena is also important. In this section, we briefly in-
under these manipulations. troduce the recent works and hypotheses on the key ques-
The work [89] randomizes the classifier using tions of adversarial learning. We hope our introduction
Dropout[90]. If these classifiers give very different predic- will give our audience a basic view on the existing ideas
tion outcomes on x after randomization, this sample x is and solutions for these questions.
very likely to be an adversarial one.
The work [17] manipulates the input sample itself to 5.1 Why do adversarial examples exist?
check the consistency. For each input sample x, the au-
thors reduce the color depth of the image (e.g., one 8-bit Some original works such as Szegedy′s L-BFGS
grayscale image with 256 possible values for each pixel attack[8], state that the existence of adversarial examples
becomes a 7-bit with 128 possible values), as shown in is due to the fact that DNN models do not generalize well
Fig. 9. The authors hypothesize that for natural images, in low probability space of data. The generalization issue
reducing the color depth will not change the prediction may be caused by the high complexity of DNN model
result, but the prediction on adversarial examples will structures.
change. In this way, they can detect adversarial ex- However, in the work [9], even linear models are also
amples. Similar to reducing the color depth, the work [89] vulnerable to adversarial attacks. Furthermore, in the
also introduced other feature squeezing methods, such as work [14], they implement experiments to show that an
spatial smoothing. increase in model capacity will improve the model robust-
4.3.4 Some attacks which evade adversarial ness.
detections Some insight can be gained about the existence of ad-
The study [16] bypassed 10 of the detection methods versarial examples by studying the model′s decision
which fall into the three categories above. The feature boundary. The adversarial examples are almost always
 
 166 International Journal of Automation and Computing 17(2), April 2020

close to decision boundary of a natural trained model, There are some distinct difference between attacking
which may be because the decision boundary is too graph models and attacking traditional image classifiers:
flat[92], too curved[93], or inflexible[94]. 1) Non-independence. Samples of the graph-struc-
Studying the reason behind the existence of adversari- tured data are not independent: Changing one′s feature or
al examples is important because it can guide us in connection will influence the prediction on others.
designing more robust models, and help us to understand 2) Poisoning attacks. Graph neural networks are
existing deep learning models. However, there is still no usually performed in a transductive learning setting: The
consensus on this problem. test data are also used to train the classifier. This means
that we modify the test data, the trained classifier is also
5.2 Can we build an optimal classifier? changed.
3) Discreteness. When modifying the graph struc-
Many recent works hypothesize that it might be im- ture, the search space for adversarial example is discrete.
possible to build optimally robust classifier. For example, Previous gradient methods to find adversarial examples
the study [95] claim that adversarial examples are inevit- may be invalid in this case.
able because the distribution of data in each class is not Below are the methods used by some successful works
well-concentrated, which leaves room for adversarial ex- to attack and defend graph neural networks.
amples. In this vein, the work [96] claims that to im-
prove the robustness of a trained model, it is necessary to 6.1 Definitions for graphs and graph mod-
collect more data. Moreover, the authors in work [25] sug- els
gest, even if we can build models with high robustness, it
must take cost of some accuracy. In this section, the notations and definitions of the
graph structured data and graph neural network models
5.3 What is transferability? are defined below. A graph can be represented as
G = {V, E}, where V is a set of N nodes and E is a set of
Transferability is one of the key properties of ad- M edges. The edges describe the connections between the
versarial examples. It means that the adversarial ex- nodes, which can also be expressed by an adjacency mat-
amples generated to target one victim model also have a rix A ∈ {0, 1}N ×N. Furthermore, a graph G is called an
high probability of misleading other models. attributed graph if each node in V is associated with a d-
Some works compare the transferability between dif- dimensional attribute vector xv ∈ Rd . The attributes for
ferent attacking algorithms. In the work [31], the authors
all the nodes in the graph can be summarized as a mat-
claim that in ImageNet, single step attacks (FGSM) are
rix X ∈ RN ×d , the i-th row of which represents the at-
more likely to transfer between models than iterative at-
tribute vector for node vi.
tacks (BIM) under same perturbation intensity.
The goal of node classification is to learn a function
The property of transferability is frequently utilized in
g : V → Y that maps each node to one class in Y , based
attacking techniques in black-box setting[48]. If the model
on a group of labeled nodes in G. One of the most suc-
parameters are veiled to attackers, they can turn to at-
cessful node classification models is graph convolutional
tack other substitute models and enjoy the transferabil-
network (GCN)[7]. The GCN model keeps aggregating the
ity of their generated samples. The property of transfer-
information from neighboring nodes to learn representa-
ability is also utilized by defending methods as in the
tions for each node v ,
work [87]: Since the adversarial examples for model A are
also likely to be adversarial for model B, adversarial
H (0) = X; H (l+1) = σ(ÂH (l) W l )
training using adversarial examples from B will help de-
fend A.
where σ is a non-linear activation function, the matrix Â
1 1
is defined as  = D̃− 2 ÃD̃− 2 , à = A + IN , and
6 Graph adversarial examples ∑
D̃ii = j Ãij . The last layer outputs the score vectors of
(m)
Adversarial examples also exist in graph-structured each node for prediction: Hv = F (v, X).
data[10, 97]. Attackers usually slightly modify the graph
structure and node features, in an effort to cause the 6.2 Zugner′s greedy method
graph neural networks (GNN) to give wrong prediction
for node classification or graph classification tasks. These In the work of Zugner et al.[10], they consider attack-
adversarial attacks therefore raise concerns on the secur- ing node classification models, graph convolutional net-
ity of applying GNN models. For example, a bank needs works[7], by modifying the nodes connections or node fea-
to build a reliable credit evaluation system where their tures (binary). In this setting, an adversary is allowed to
model should not be easily attacked by malicious manipu- add/remove edges between nodes, or flip the feature of
lations. nodes with limited number of operations. The goal is to
 
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 167 

mislead the GCN model which is trained on the per- siders adding or removing edges to modify the graph
turbed graph (transductive learning) to give wrong pre- structure.
dictions. In their work, they also specify three levels of In the work′s setting of [97], a node classifier F
adversary capabilities: they can manipulate 1) all nodes, trained on the clean graph G(0) = G is given, node classi-
2) a set of nodes A including the target victim x, and 3) fier F is unknown to the attacker, and the attacker is al-
a set of nodes A which does not include target node x. A lowed to modify m edges in total to alter F ′s prediction
sketch is shown in Fig. 10. on the victim node v0. The authors formulate this attack-
ing mission as a Q-Learning game[99], with the defined
[] [] Target node [] []
 

Markov decision process as below:

··

··
··

··

Perturbation 1) State. The state st is represented by the tuple


[] [] [] []

××
··
··

··

(G(t) , v0 ), where G(t) is the modified graph with t iterat-


[] Attacker node [] ive steps.
··
··

2) Action. To represent the action to add/remove


Train node classification model
edges, a single action at time step t is at ∈ V × V , which
means the edge to be added or removed.
Target gets 3) Reward. In order to encourage actions to fool the
misclassified
classifier, we should give positive reward if v0′s label is
  altered. Thus, the authors define the reward function as:
Fig. 10     Adding  an  edge  to  alter  the  prediction  of  graph r(st , at ) = 0, ∀t = 1, 2, · · · , m − 1, and for the last step:
convolutional network (Image credit: Zugner et al.[10])
 

{
Similar to the objective function in Carlini and Wagn- 1, if C(v0 , G(m) ) ̸= y
r(sm , am ) =
−1, if C(v0 , G(m) ) = y.
er[34] for image data, they formulate the graph attacking
problem as a search for a perturbed graph G′ such that
4) Termination. The process stops once the agent
the learned GCN classifier Z ∗ has the largest score mar-
finishes modifying m edges.
gin:
The Q-learning algorithm helps the adversary have
knowledge about which actions to take (add/remove
max ln(Zy∗ (v0 , G′ )) − ln(Zi∗ (v0 , G′ )). (9)
i̸=y which edge) on the given state (current graph structure),
in order to get largest reward (change F ′s output).
The authors solve this objective by finding perturba-
tions on a fixed, linearized substitute GCN classifier Gsub 6.4 Graph structure poisoning via meta-
which is trained on the clean graph. They use a heuristic learning
algorithm to find the most influential operations on graph
Gsub (e.g., removing/adding the edge or flipping the fea- Previous graph attack works only focus on attacking
ture which can cause largest increase in (9)). The experi- one single victim node. Meta learning attack[100] attempt
mental results demonstrate the adversarial operations are to poison the graph so that the global node classification
also effective on the later trained classifier Z ∗ . performance of GCN can be undermined and made al-
During the attacking process, the authors also impose most useless. Their approach is based on meta
two key constraints to ensure the similarity of the per- learning[101], which is traditionally used for hyperparamet-
turbed graph to the original one: 1) the degree distribu- er optimization, few-shot image recognition, and fast rein-
tion should be maintained, and 2) two positive features forcement learning. In the work [100], they use meta
which never happen together in G should also not hap- learning technique which takes the graph structure as the
pen together in G′ . Later, some other graph attacking hyperparameter of the GCN model to optimize. Using
works (e.g., [98]) suggest the eigenvalues/eigenvectors of their algorithm to perturb 5% edges of a CITESEER
the graph Laplacian matrix should also be maintained graph dataset, they can increase the misclassification rate
during attacking, otherwise the attacks are easily detec- to over 30%.
ted. However, there is still no firm consensus on how to
formally define the similarity between graphs and gener- 6.5 Attack on node embedding
ate unnoticeable perturbation.
Node embedding attack[102] studies how to perturb the
6.3 Dai′s RL method: RL-S2V graph structure in order to corrupt the quality of node
embedding, and consequently hinder subsequent learning
Different from Zugner′s greedy method, the work of tasks such as node classification or link prediction. Spe-
Dai et al.[97], introduced a reinforcement learning method cifically, they study DeepWalk[103] as a random-walk
to attack the graph neural networks. This work only con- based node embedding learning approach and approxim-
 
 168 International Journal of Automation and Computing 17(2), April 2020

ately find the graph which has the largest loss of the DeepSpeech[107]. In their setting, when given any speech
learned node embedding. waveform x, they can add an inaudible sound perturba-
tion δ that makes the synthesized speech x + δ be recog-
6.6 ReWatt: Attacking graph classifier via nized as any targeted desired phrase.
rewiring In their attacking work, they limited the maximum
decibels (dB) on any time of the added perturbation
The ReWatt method[98] attempts to attack the graph noise, so that the audio distortion is unnoticeable.
classification models, where each input of the model is a Moreover, they inherit the C & W′s attack method[34] on
whole graph. The proposed algorithm can mislead the their audio attack setting.
model by making unnoticeable perturbations on graph.
In their attacking scheme, they utilize reinforcement 7.2 Text classification attacks
learning to find a rewiring operation a = (v1 , v2 , v3 ) at
each step, which is a set of 3 nodes. The first two nodes Text classification is one of main tasks in natural lan-
were connected in the original graph and the edge guage processing. In text classification, the model is de-
between them is removed in the first step of the rewiring vised to understand a sentence and correctly label the
process. The second step of the rewiring process adds an sentence. For example, text classification models can be
edge between the nodes v1 and v3, where v3 is con- applied on IMDB dataset for characterizing user′s opin-
strained to be within 2 -hops away from v1. Some ion (positive or negative) on the movies, based on their
analysis[98] show that the rewiring operation tends to keep provided reviews. Recent works of adversarial attacks
the eigenvalues of the graph′s Laplacian matrix, which have demonstrated that text classifiers are easily mis-
makes it difficult to detect the attacker. guided by adversaries slightly modifying the texts'
spelling, words or structure.
6.7 Defending graph neural networks 7.2.1 Attack word embedding
The work [108] considers to add perturbation on the
Many works have shown that graph neural networks word embedding[109], so as to fool a LSTM[4] classifier.
are vulnerable to adversarial examples, even though there However, this attack only considers perturbing the word
is still no consensus on how to define the unnoticeable embedding, instead of original input sentence itself.
perturbation. Some defending works have already ap- 7.2.2 Manipulate words, letters
peared. Many of them are inspired by the popular de- The work HotFlip[11] considers to replace a letter in a
fense methodology in image classification, using adversari- sentence in order to mislead a character-level text classifi-
al training to protect GNN models[104, 105], which provides er (each letter is encoded to a vector). For example, as
moderate robustness. shown in Fig. 11, altering a single letter in a sentence al-
ters the model′s prediction on its topic. The attack al-
7 Adversarial examples in audio and gorithm manages to achieve this by finding the most-in-
fluential letter replacement via gradient information.
text data
These adversarial perturbations can be noticed by hu-
Adversarial examples also exist in DNN′s applications man readers, but they don't change the content of the
in audio and text domains. An adversary can craft fake text as a whole, nor do they affect human judgments.
speech or fake sentences that mislead the machine lan-  

guage processors. Meanwhile, deep learning models on au- South Africa’s historic Soweto township marks its
dio/text data have already been widely used in many 100th birthday on Tuesday in a mood of optimism.
57% World
tasks, such as Apple Siri and Amazon Echo. Therefore, South Africa’s historic Soweto township marks its
the studies on adversarial examples in audio/text data 100th birthday on Tuesday in a mooP of optimism.
domain also deserve our attention. 95% Sci/Tech
 
As for text data, the discreteness nature of the inputs
Fig. 11     Replace  one  letter  in  a  sentence  to  alter  a  text
makes the gradient-based attack on images not applic- classifier′ s  prediction  on  a  sentence′ s  topic  (Image  credit:
able anymore and forces people to craft discrete perturba- Ebrahimi et al.[11])
 

tions on different granularities of text (character-level,


word-level, sentence-level, etc). In this section, we intro- The work [110] considers to manipulate the victim
duce the related works in attacking NLP architectures for sentence on word, phrase level. They try adding, remov-
different tasks. ing or modifying the words and phrases in the sentences.
In their approach, the first step is similar to HotFlip[11].
7.1 Speech recognition attacks For each training sample, they find the most-influential
letters, called “hot characters”. Then, they label the
Carlini and Wagner[106] studies to attack state-of-art words that have more than 3 “hot characters” as “hot
speech-to-text transcription network, such as words”. “Hot words” composite “hot phrases”, which are
 
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 169 
 

most-influential phrases in the sentences. Manipulating Article: Super Bowl 50


these phrases is likely to influence the model′s prediction, Paragraph: “Peyton Manning became the first quarter-
back ever to lead two different teams to multiple Super
so these phrases composite a “vocabulary” to guide the Bowls. He is also the oldest quarterback ever to play in
attacking. When an adversary is given a sentence, he can a Super Bowl at age 39. The past record was held by
use this vocabulary to find the weakness of the sentence, John Elway, who led the Broncos to victory in Super
Bowl XXXIII at age 38 and is currently Denver’s Execu-
add one hot phrase, remove a hot phrase in the given sen- tive Vice President of Football Operations and General
tence, or insert a meaningful fact which is composed of Manager. Quarterback Jeff Dean had jersey number 37
hot phrases. in Champ Bowl XXXIV.”
Question: “What is the name of the quarterback who
DeepWordBug[111] and TextBugger[112] are black-box was 38 in Super Bowl XXXII?”
attack methods for text classification. The basic idea of Original Prediction: John Elway
the former is to define a scoring strategy to identify the Prediction under adversary: Jeff Dean
key tokens which will lead to a wrong prediction of the  
classifier if modified. Then they try four types of “imper- Fig. 12     By  adding  an  adversarial  sentence  which  is  similar  to
the  answer,  the  reading  comprehension  model  gives  a  wrong
ceivable” modifications on such tokens: swap, substitu- answer (Image credit: Jia and Liang[116])
tion, deletion and insertion, to mislead the classifier. The
 

latter follows the same idea, and improves it by introdu- the inserted sentence (blue) looks similar to the question,
cing new scoring functions. but does not contradict the correct answer. This inserted
The works of Samanta and Mehta[113], Iyyer et al.[114] sentence is understandable for human reader but con-
start to craft adversarial sentences that grammarly cor- fuses the machine a lot. As a result, the proposed attack-
rect and maintain the syntax structure of the original ing algorithm reduced the performance of 16 state-of-art
sentence. Samanta and Mehta[113] achieve this by using reading comprehension models from average 75% F1 score
synonyms to replace original words, or adding some (accuracy) to 36%.
words which have different meanings in different context. Their proposed algorithm AddSent shows a four-step
On the other hand, Iyyer et al.[114] manage to fool the operation to find adversarial sentence.
text classifier by paraphrasing the structure of sentences. 1) Fake question: What is the name of the quarter-
Witbrock[115] conducts sentence and word paraphras- back whose jersey number is 37 in Champ Bowl XXXIV?
ing on input texts to craft adversarial examples. In this 2) Fake answer: Jeff Dean.
work, they first build a paraphrasing corpus that con- 3) Question to declarative form: Quarterback Jeff
tains a lot of word and sentence paraphrases. To find an Dean is jersey number 37 in Champ Bowl XXXIV.
optimal paraphrase of an input text, a greedy method is 4) Get grammarly correct: Quarterback Jeff Dean had
adopted to search valid paraphrases for each word or sen- jersey number 37 in Champ Bowl XXXIV.
tence from the corpus. Moreover, they propose a gradi- 7.3.2 Attack on neural machine translation
ent-guided method to improve the efficiency of greedy The work [117] studies the stability of machine learn-
search. This work also has significant contributions in ing translation tools when their input sentences are per-
theory: They formally define the task of discrete ad- turbed from natural errors (typos, misspellings, etc) and
versarial attack as an optimization problem on a set func- manually crafted distortions (letter replacement, letter re-
tion and they prove that the greedy algorithm ensures a order). The experimental results show that the state-of-
1
1 − approximation factor for CNN and RNN text clas- arts translation models are vulnerable to both two types
e
sifiers. of errors, and suggest adversarial training to improve the
model′s robustness.
7.3 Adversarial examples in other NLP Seq2Sick[118] tries to attack seq2seq models in neural
tasks machine translation and text summarization. In their set-
ting, two goals of attacking are set: to mislead the model
7.3.1 Attack on reading comprehension systems to generate an output which has on overlapping with the
In the work [116], the authors study whether Reading ground truth, and to lead the model to produce an out-
Comprehension models are vulnerable to adversarial at- put with targeted keywords. The model is treated as a
tacks. In reading comprehension tasks, the machine learn- while-box and the authors formulate the attacking prob-
ing model is asked to answer a given question, based on lem as an optimization problem where they seek to solve
the model′s “understanding” from a paragraph of an art- a discrete perturbation by minimizing a hinge-like loss
icle. For example, the work [116] concentrates on Stan- function.
ford Question Answering Dataset (SQuAD), in which sys-
tems answer questions about paragraphs from Wikipedia. 7.4 Dialogue generation
The authors successfully degrade the intelligence of
the state-of-art reading comprehension models on SQuAD Unlike the tasks above where success and failure are
by inserting adversarial sentences. As shown in Fig. 12, clearly defined, in the task of dialogue, there is no unique
 
 170 International Journal of Automation and Computing 17(2), April 2020

appropriate response for a given context. Thus, instead of Beyond digital-level adversarial faces, they also suc-
misleading a well-trained model to produce incorrect out- ceed in misleading face recognition models on physical
puts, works about attacking dialogue models seek to ex- level. They achieve this by asking subjects to wear their
plore the property of neural dialogue models to be in- 3D printed sunglasses frames. The authors optimize the
terfered by the perturbations on the inputs, or lead a color of these glasses by attacking the model on a digital
model to output targeted responses. level: by considering various adversarial glasses, the most
In the study [119], the authors explore the over-sensit- effective adversarial glasses are used for attack. As shown
ivity and over-stability of neural dialogue models by us- in Fig. 13, an adversary wears the adversarial glasses and
ing some heuristic techniques to modify original inputs successfully fool the detection of victim face recognition
and observe the corresponding outputs. They evaluate system.
the robustness of dialogue models by checking whether  

the outputs change significantly after the modifications


on the inputs but do not consider targeted outputs. They
also investigate the effects that take place when retrain-
ing the dialogue model using these adversarial examples
to improve the robustness and performance of the under-
lying model.
In the work [120], the authors try to find trigger in-
puts which can lead a neural dialogue model to generate
 
targeted egregious responses. They design a search-based
Fig. 13     An  adversary  (left)  wears  a  pair  of  adversarial  glasses
method to determine the word in the input that maxim- and is recognized as a movie-star, Milla Jovovich (Image credit:
izes the generative probability of the targeted response. Sharif et al.[122])
 

Then, they treat the dialogue model as a white-box and


take advantage of the gradient information to narrow the 2) Object detection and semantic segmentation
search space. Finally they show that this method works There are also studies on semantic segmentation and
for "normal" targeted responses which are decoding res- object detection models in computer vision[124, 125]. In
ults for some input sentences, but for manually written both semantic segmentation and object detection tasks,
malicious responses, it hardly succeeds. the goal is to learn a model that associates an input im-
The work [121] treats the neural dialogue model as a age x with a series of labels Y = {y1 , y2 , · · · , yN }. Se-
black-box and adopts a reinforcement learning frame- mantic segmentation models give each pixel of x a label
work to effectively find trigger inputs for targeted re- yi, so that the image is divided to different segments.
sponses. The black-box setting is stricter but more realist- Similarly, object detection models label all proposals (re-
ic, while the requirements for the generated responses are gions where the objects lie).
properly relaxed. The generated responses are expected to The attacks in [124] can generate an adversarial per-
be semantically identical to the targeted ones but not ne- turbation on x which can cause the classifier to give
cessarily exactly match with them. wrong prediction on all the output labels of the model, in
order to fool either semantic segmentation or object de-
8 Adversarial examples in tection models. The attacks[125] finds that there exists
miscellaneous tasks universal perturbation for any input image for semantic
segmentation models.
In this section, we summarize some adversarial at-
tacks in other domains. Some of these domains are safety- 8.2 Video adversarial examples
critical, so the studies on adversarial examples in these
domains are also important. Most works concentrate on attacking static image
classification models. However, success on image attacks
8.1 Computer vision beyond image classi- cannot guarantee that there exist adversarial examples on
fication videos and video classification systems. The work [126]
uses GAN[55] to generate a dynamic perturbation on video
1) Face recognition clips that can mislead the classification of video classifiers.
The work [122] seek to attack face recognition models
on both a digital level and physical level. The main vic- 8.3 Generative models
tim model is based on the architecture of Parkhi et al.[123],
which is a 39-layer DNN model for face recognition tasks. The work [127] attacks the variational autoencoder
The attack on the digital level is based on traditional at- (VAE)[128] and VAE-GAN[129]. Both VAE and VAE-GAN
tacks, like Szegedy′s L-BFGS method (Section 3.1.2). use an encoder to project the input image x into a lower-
 
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 171 
 

dimensional latent representation z , and as well a de-


coder to reconstruct a new image x̂ from z . The recon-
structed image should maintain the same principle se-
mantics as the original image.
In the setting of attack[127], the authors aim to slightly
perturb the input image x fed to encoder, which will
cause the decoder to generate image fdec (fenc (x)) having
different meaning from the input x. For example, in
MNIST dataset, the input image is “1”, and the recon- (a) Action taken: up (b) Action taken: down
structed image is “0”. original input adversarial input
 
Fig. 14     Left  figure:  the  brick  takes  correct  actions  to  go  up  to
8.4 Malware detection catch  the  ball.  Right  figure:  the  current  state  is  perturbed  by
changing one pixel. The policy gives an incorrect command to go
The existence of adversarial examples in safety-critic- down. (Image credit: Huang et al.[136])
 

al tasks, such as malware detection, should be paid much


approach is inherited from FGSM[9], to take one-step
attention. The work [130] built a DNN model on the
gradient on the state x (latest images of game video) to
DREBIN dataset[131], which contains 120 000 Android ap-
plication samples, where over 5 000 are malware samples. craft a fake state x′ . The policy′s decision on x′ can be
The trained model has 97% accuracy, but malware totally useless to achieve the reward. Their results show
samples can evade the classifier if attackers add fake fea- that a slight perturbation on RL models' state, can cause
tures to them. Some other works, Hu and Tan[132] and large difference on the models' decision and performance.
Anderson et al.[133], consider using GANs[55] to generate Their work show Deep Q Learning[99], TRPO[137] and
adversarial malware. A3C[138] are all vulnerable to their attacks.

8.5 Fingerprint recognizer attacks 9 Conclusions


In this survey, we give a systemic, categorical and
Fingerprint recognition systems are also one of the
comprehensive overview on the recent works regarding
most safety-critical fields where machine learning models
adversarial examples and their countermeasures, in mul-
are adopted. While, there are adversarial attacks under-
tiple data domains. We summarize the studies from each
mining the reliability of these models. For example, fin-
section in the chronological order as shown in Fig. B in
gerprint spoof attacks copy an authorized person′s finger-
Appendix B, because these works are released with relat-
print and replicate it on some special materials such as li-
ively high frequency in response to one another. The cur-
quid latex or gelatin. Traditional fingerprint recognition
rent state-of-the-art attacks will likely be neutralized by
techniques especially minutiae-based models fail to distin-
guish the fingerprint images generated from different ma- new defenses, and these defenses will subsequently be cir-
terials. The works of Chugh et al.[134, 135] design a modi- cumvented. We hope that our work can shed some light
fied CNN to effectively detect this fingerprint spoof at- on the main ideas of adversarial learning and related ap-
tack. plications in order to encourage progress in this field.

8.6 Reinforcement learning Acknowledgements


This work was supported by National Science Founda-
Different from classification tasks, deep reinforcement
tion (NSF), USA (Nos. IIS-1845081 and CNS-1815636).
learning (RL) aims to learn how to perform some human
tasks, such as play Atari 2600 games[99] or play Go[5]. For
Open Access
example, to play an Atari game Pong, (Fig. 14(a)), the
trained model takes input from the latest images of game This article is licensed under a Creative Commons At-
video (state ∑ x), and output a decision to move up or tribution 4.0 International License, which permits use,
down (action ∑ y ). The learned model can be viewed as a sharing, adaptation, distribution and reproduction in any
rule (policy ∑ πθ) to win the game (reward ∑ L(θ, x, y)). medium or format, as long as you give appropriate credit
πθ
A simple sketch can be: x −→ y , which is in parallel to to the original author(s) and the source, provide a link to
f
classification tasks: x − → y . The RL algorithms are the Creative Commons licence, and indicate if changes
trained to learn the parameters of πθ. were made.
The RL attack[137] shows deep reinforcement learning To view a copy of this licence, visit http://creative-
models are also vulnerable to adversarial examples. Their commons.org/licenses/by/4.0.

 
 172 International Journal of Automation and Computing 17(2), April 2020

Appendix
A. Dichotomy of attacks

Table A     Dichotomy of attacks

Attack Publication Similarity Attacking capability Algorithm Apply domain

L-BFGS [8] l2 White-box Iterative Image classification

FGSM [9] l∞,l2 White-box Single-step Image classification

Deepfool [32] l2 White-box Iterative Image classification

JSMA [33] l2 White-box Iterative Image classification

BIM [31] l∞ White-box Iterative Image classification

C&W [34] l2 White-box Iterative Image classification

Ground truth [35] l0 White-box SMT solver Image classification

Spatial [44] Total variation White-box Iterative Image classification

Universal [125] l∞, l2 White-box Iterative Image classification

One-Pixel [39] l0 White-box Iterative Image classification

EAD [40] l1 + l2 , l2 White-box Iterative Image classification

Substitute [48] lp Black-box Iterative Image classification

ZOO [50] lp Black-box Iterative Image classification

Biggio [19] l2 Poisoning Iterative Image classification

Explanation [58] lp Poisoning Iterative Image classification

Zugner′s [10] Degree distribution, coocurrence Poisoning Greedy Node classification

Dai′s [97] Edges Black-box RL Node & Graph classification

Meta [100] Edges Black-box RL Node classification

C&W [106] max dB White-box Iterative Speech recognition

Word embedding [108] lp White-box One-step Text classification

HotFlip [11] letters White-box Greedy Text classification

Jia & Liang [116] letters Black-box Greedy Reading comprehension

Face recognition [122] physical White-box Iterative Face recognition

RL attack [137] lp White-box RL


 

 
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 173 

B. Dichotomy of defenses

Defenses
 

Gradient masking Robust optimization Detection

Shattered gradient Adversarial training Auxilliary model

[61] [9] [130]

[62] [14] [85]

Shattered gradient [21] [86]

[64] [83] Statistical methods

[65] Certified defense [87]

Exploding/Vanishing gradient [78] [84]

[66] [79] [88]

[67] [77] Check consistency

[12] [83] [89]

Regularization [17]

[69]

[71]

[72]
 
Fig. B      Dichotomy of defenses

van  den  Driessche,  J.  Schrittwieser,  I.  Antonoglou,  V.


References
Panneershelvam,  M.  Lanctot,  S.  Dieleman,  D.  Grewe,  J.
[1] A. Krizhevsky, I. Sutskever, G. E. Hinton. Imagenet clas-
 
Nham,  N.  Kalchbrenner,  I.  Sutskever,  T.  Lillicrap,  M.
sification  with  deep  convolutional  neural  networks.  In Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis. Master-
Proceedings  of  the  25th  International  Conference  on ing  the  game  of  go  with  deep  neural  networks  and  tree
Neural  Information  Processing  Systems,  Curran  Asso- search. Nature, vol. 529, no. 7587, pp. 484–489, 2016. DOI:
ciates Inc., Lake Tahoe, USA, pp. 1097–1105, 2012. 10.1038/nature16961.
[2] K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual
 
[6] D.  Cireşan,  U.  Meier,  J.  Masci,  J.  Schmidhuber.  Multi-
 

learning  for  image  recognition.  In  Proceedings  of  IEEE column deep neural network for traffic sign classification.


Conference  on  Computer  Vision  and  Pattern  Recogni- Neural  Networks,  vol. 32,  pp. 333–338,  2012.  DOI:
tion,  IEEE,  Las  Vegas,  USA,  pp. 770–778,  2016.  DOI: 10.1016/j.neunet.2012.02.023.
10.1109/CVPR.2016.90.
[7] T.  N.  Kipf,  M.  Welling.  Semi-supervised  classification
 

[3] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. R. Mohamed,
 

with  graph  convolutional  networks.  ArXiv:  1609.02907,


N.  Jaitly,  A.  Senior,  V.  Vanhoucke,  P.  Nguyen,  T.  N. 2016.
Sainath,  B.  Kingsbury.  Deep  neural  networks  for  acous-
tic  modeling  in  speech  recognition:  The  shared  views  of [8] C.  Szegedy,  W.  Zaremba,  I.  Sutskever,  J.  Bruna,  D.  Er-
 

four  research  groups.  IEEE  Signal  Processing  Magazine, han,  I.  Goodfellow,  R.  Fergus.  Intriguing  properties  of
vol. 29,  no. 6,  pp. 82–97,  2012.  DOI:  10.1109/MSP.2012. neural networks. ArXiv: 1312.6199, 2013.
2205597. [9] I.  J.  Goodfellow,  J.  Shlens,  C.  Szegedy.  Explaining  and
 

[4] S. Hochreiter, J. Schmidhuber. Long short-term memory.
 
harnessing adversarial examples. ArXiv: 1412.6572, 2014.
Neural  Computation,  vol. 9,  no. 8,  pp. 1735–1780,  1997.
[10] D.  Zügner,  A.  Akbarnejad,  S.  Günnemann.  Adversarial
 

DOI: 10.1162/neco.1997.9.8.1735.
attacks  on  neural  networks  for  graph  data.  In  Proceed-
[5]   D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. ings  of  the  24th  ACM  SIGKDD  International  Conference
 
 174 International Journal of Automation and Computing 17(2), April 2020

on  Knowledge  Discovery  &  Data  Mining, ACM, London, [26] D. Su, H. Zhang, H. G. Chen, J. F. Yi, P. Y. Chen, Y. P.


 

UK,  pp. 2847–2856,  2018.  DOI:  10.1145/3219819. Gao. Is robustness the cost of accuracy? – A comprehens-


3220078. ive  study  on  the  robustness  of  18  deep  image  classifica-
tion models. In Proceedings of the 15th European Confer-
[11] J.  Ebrahimi,  A.  Y.  Rao,  D.  Lowd,  D.  J.  Dou.  HotFlip:
ence  on  Computer  Vision,  Springer,  Munich,  Germany,
 

White-box  adversarial  examples  for  text  classification.


pp. 644–661, 2018. DOI: 10.1007/978-3-030-01258-8_39.
ArXiv: 1712.06751, 2017.
[27] D.  Stutz,  M.  Hein,  B.  Schiele.  Disentangling  adversarial
[12] N. Papernot, P. McDaniel, X. Wu, S. Jha, A. Swami. Dis-
 

robustness and generalization. In Proceedings of the 32nd
 

tillation as a defense to adversarial perturbations against
IEEE  Conference  on  Computer  Vision  and  Pattern  Re-
deep  neural  networks.  In  Proceedings  of  IEEE  Symposi-
cognition, IEEE, Piscataway, USA, pp. 6976–6987, 2019.
um  on  Security  and  Privacy,  IEEE,  San  Jose,  USA,
pp. 582–597, 2016. DOI: 10.1109/SP.2016.41. [28] H.  Y.  Zhang,  Y.  D.  Yu,  J.  T.  Jiao,  E.  P.  Xing,  L.  El
 

Ghaoui,  M.  I.  Jordan.  Theoretically  principled  trade-off


[13] A. Athalye, N. Carlini, D. Wagner. Obfuscated gradients
between  robustness  and  accuracy.  ArXiv:  1901.08573,
 

give  a  false  sense  of  security:  Circumventing  defenses  to


2019.
adversarial examples. ArXiv: 1802.00420, 2018.
[29] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, F. F. Li. Im-
[14] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu.
 

agenet: A large-scale hierarchical image database. In Pro-
 

Towards deep learning models resistant to adversarial at-
ceedings  of  IEEE  Conference  on  Computer  Vision  and
tacks. ArXiv: 1706.06083, 2017.
Pattern  Recognition,  IEEE,  Miami,  USA,  pp. 248–255,
[15] A.  Kurakin,  I.  Goodfellow,  S.  Bengio.  Adversarial  ex-
 
2009. DOI: 10.1109/CVPR.2009.5206848.
amples in the physical world. ArXiv: 1607.02533, 2016.
[30] D.  C.  Liu,  J.  Nocedal.  On  the  limited  memory  BFGS
 

[16] N. Carlini, D. Wagner. Adversarial examples are not eas-
 
method  for  large  scale  optimization.  Mathematical  Pro-
ily  detected:  Bypassing  ten  detection  methods.  In  Pro- gramming,  vol. 45,  no. 1–3,  pp. 503–528,  1989.  DOI:
ceedings  of  the  10th  ACM  Workshop  on  Artificial  Intelli- 10.1007/BF01589116.
gence  and  Security,  ACM,  Dallas,  USA,  pp. 3–14,  2017.
[31] A.  Kurakin,  I.  Goodfellow,  S.  Bengio.  Adversarial  ma-
DOI: 10.1145/3128572.3140444.
 

chine learning at scale. ArXiv: 1611.01236, 2016.
[17] W. L. Xu, D. Evans, Y. J. Qi. Feature squeezing: Detect-
[32] S. M. Moosavi-Dezfooli, A. Fawzi, P. Frossard. DeepFool:
 

ing adversarial examples in deep neural networks. ArXiv:
 

A  simple  and  accurate  method  to  fool  deep  neural  net-


1704.01155, 2017.
works.  In  Proceedings  of  IEEE  Conference  on  Computer
[18] A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran,
 
Vision  and  Pattern  Recognition, IEEE, Las Vegas, USA,
A.  Madry.  Adversarial  examples  are  not  bugs,  they  are pp. 2574–2582, 2016. DOI: 10.1109/CVPR.2016.282.
features. ArXiv: 1905.02175, 2019.
[33] N.  Papernot,  P.  McDaniel,  S.  Jha,  M.  Fredrikson,  Z.  B.
 

[19] B.  Biggio,  B.  Nelson,  P.  Laskov.  Poisoning  attacks


 
Celik,  A.  Swami.  The  limitations  of  deep  learning  in  ad-
against  support  vector  machines.  In  Proceedings  of  the versarial settings. In Proceedings of IEEE European Sym-
29th  International  Coference  on  International  Confer- posium  on  Security  and  Privacy,  IEEE,  Saarbrucken,
ence  on  Machine  Learning,  Omnipress,  Edinburgh,  UK, Germany,  pp. 372−387,  2016.  DOI:  10.1109/EuroSP.
2012. 2016.36.
[20] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati,
 
[34] N.  Carlini,  D.  Wagner.  Towards  evaluating  the  robust-
 

C.  W.  Xiao,  A.  Prakash,  T.  Kohno,  D.  Song.  Robust ness of neural networks. In Proceedings of IEEE Symposi-
physical-world  attacks  on  deep  learning  models.  ArXiv: um  on  Security  and  Privacy,  IEEE,  San  Jose,  USA,
1707.08945, 2017. pp. 39–57, 2017. DOI: 10.1109/SP.2017.49.
[21] F.  Tramer,  A.  Kurakin,  N.  Papernot,  I.  Goodfellow,  D.
 
[35] N. Carlini, G. Katz, C. Barrett, D. L Dill. Provably min-
 

Boneh,  P.  McDaniel.  Ensemble  adversarial  training:  At- imally-distorted adversarial examples. ArXiv: 1709.10207,


tacks and defenses. ArXiv: 1705.07204, 2017. 2017.
[22] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P.
 
[36] G.  Katz,  C.  Barrett,  D.  L.  Dill,  K.  Julian,  M.  J.
 

Laskov, G. Giacinto, F. Roli. Evasion attacks against ma- Kochenderfer. Reluplex: An efficient SMT solver for veri-
chine  learning  at  test  time.  In  Proceedings  of  European fying deep neural networks. In Proceedings of the 29th In-
Conference  on  Machine  Learning  and  Knowledge  Discov- ternational  Conference  on  Computer  Aided  Verification,
ery  in  Databases,  Springer,  Prague,  Czech  Republic, Springer,  Heidelberg,  Germany,  pp. 97–117,  2017.  DOI:
pp. 387–402, 2013. DOI: 10.1007/978-3-642-40994-3_25. 10.1007/978-3-319-63387-9_5.
[23] M. Barreno, B. Nelson, A. D. Joseph, J. D. Tygar. The se-
 
[37] V. Tjeng, K. Xiao, R. Tedrake. Evaluating robustness of
 

curity  of  machine  learning.  Machine  Learning,  vol. 81, neural networks with mixed integer programming. ArXiv:


no. 2,  pp. 121–148,  2010.  DOI:  10.1007/s10994-010-5188- 1711.07356, 2017.
5.
[38] K. Y. Xiao, V. Tjeng, N. M. Shafiullah, A. Madry. Train-
 

[24] N.  Dalvi,  P.  Domingos,  Mausam,  S.  Sanghai,  D.  Verma.
 
ing for faster adversarial robustness verification via indu-
Adversarial  classification.  In  Proceedings  of  the  10th cing ReLU stability. ArXiv: 1809.03008, 2018.
ACM  SIGKDD  International  Conference  on  Knowledge
[39] J. W. Su, D. V. Vargas, K. Sakurai. One pixel attack for
Discovery  and  Data  Mining,  ACM,  Seattle,  USA,
 

fooling  deep  neural  networks.  IEEE  Transactions  on


pp. 99–108, 2004. DOI: 10.1145/1014052.1014066.
Evolutionary  Computation,  vol. 23,  no. 5,  pp. 828–841,
[25] D.  Tsipras,  S.  Santurkar,  L.  Engstrom,  A.  Turner,  A.
  2019. DOI: 10.1109/TEVC.2019.2890858.
Madry. Robustness may be at odds with accuracy. ArXiv:
[40] P.  Y.  Chen,  Y.  Sharma,  H.  Zhang,  J.  F.  Yi,  C.  J.  Hsieh.
1805.12152, 2018.
 

EAD: Elastic-net attacks to deep neural networks via ad-
 
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 175 

versarial  examples.  In  Proceedings  of  the  32nd  AAAI [55] I.  J.  Goodfellow,  J.  Pouget-Abadie,  M.  Mirza,  B.  Xu,  D.
 

Conference on Artificial Intelligence, 2018. Warde-Farley,  S.  Ozair,  A.  Courville,  Y.  Bengio.  Gener-


ative adversarial nets. In Proceedings of the 27th Interna-
[41] Y.  Sharma,  P.  Y.  Chen.  Attacking  the  madry  defense
tional  Conference  on  Neural  Information  Processing  Sys-
 

model  with  L1-based  adversarial  examples.  ArXiv:


tems, MIT Press, Montreal, Canada, pp. 2672–2680, 2014.
1710.10733, 2017.
[56] D.  Deb,  J.  B.  Zhang,  A.  K.  Jain.  Advfaces:  Adversarial
[42] S. M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, P. Frossard.
 

face synthesis. ArXiv: 1908.05008, 2019.
 

Universal  adversarial  perturbations.  In  Proceedings  of


IEEE  Conference  on  Computer  Vision  and  Pattern  Re- [57] G.  Cauwenberghs,  T.  Poggio.  Incremental  and  decre-
 

cognition,  IEEE,  Honolulu,  USA,  pp. 86–94,  2017.  DOI: mental  support  vector  machine  learning.  In  Proceedings
10.1109/CVPR.2017.17. of  the  13th  International  Conference  on  Neural  Informa-
tion  Processing  Systems,  MIT  Press,  Denver,  USA,
[43] O.  Russakovsky,  J.  Deng,  H.  Su,  J.  Krause,  S.  Satheesh,
 

pp. 388–394, 2000.
S. A. Ma, Z. H. Huang, A. Karpathy, A. Khosla, M. Bern-
stein, A. C. Berg, F. F. Li. ImageNet large scale visual re- [58] P.  W.  Koh,  P.  Liang.  Understanding  black-box  predic-
 

cognition  challenge.  International  Journal  of  Computer tions  via  influence  functions.  In  Proceedings  of  the  34th
Vision,  vol. 115,  no. 3,  pp. 211–252,  2015.  DOI: International  Conference  on  Machine  Learning,  Sydney,
10.1007/s11263-015-0816-y. Australia, pp. 1885–1894, 2017.
[44] C. W. Xiao, J. Y. Zhu, B. Li, W. He, M. Y. Liu, D. Song.
 

[59] A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer,
 

Spatially  transformed  adversarial  examples.  ArXiv: T. Dumitras, T. Goldstein. Poison frogs! Targeted clean-


1801.02612, 2018. label  poisoning  attacks  on  neural  networks.  In  Proceed-
[45] Y.  Song,  R.  Shu,  N.  Kushman,  S.  Ermon.  Constructing
 
ings  of  the  32nd  Conference  on  Neural  Information  Pro-
unrestricted  adversarial  examples  with  generative  mod- cessing Systems, Montréal, Canada, pp. 6103–6113, 2018.
els. In  Proceedings  of  the  32nd  Conference  on  Neural  In- [60] G. Hinton, O. Vinyals, J. Dean. Distilling the knowledge
 

formation  Processing  Systems,  Montréal,  Canada, in a neural network. ArXiv: 1503.02531, 2015.


pp. 8312–8323, 2018.
[61] J.  Buckman,  A.  Roy,  C.  Raffel,  I.  Goodfellow.  Thermo-
 

[46] A. Odena, C. Olah, J. Shlens. Conditional image synthes-
 

meter  encoding:  One  hot  way  to  resist  adversarial  ex-


is  with  auxiliary  classifier  GANs.  In  Proceedings  of  the amples.  In  Proceedings  of  the  6th  International  Confer-
34th  International  Conference  on  Machine  Learning, ence  on  Learning  Representations,  Vancouver,  Canada,
Sydney, Australia, pp. 2642–2651, 2017. 2018.
[47] A. Athalye, L. Engstrom, A. Ilyas, K. Kwok. Synthesizing
 

[62] C. Guo, M. Rana, M. Cisse, L. van der Maaten. Counter-
 

robust adversarial examples. ArXiv: 1707.07397, 2017. ing adversarial images using input transformations. ArX-
[48] N.  Papernot,  P.  McDaniel,  I.  Goodfellow,  S.  Jha,  Z.  B.
 
iv: 1711.00117, 2017.
Celik, A. Swami. Practical black-box attacks against ma- [63] V.  K.  Ha,  J.  C.  Ren,  X.  Y.  Xu,  S.  Zhao,  G.  Xie,  V.  M.
 

chine  learning.  In  Proceedings  of  ACM  on  Asia  Confer- Vargas.  Deep  learning  based  single  image  super-resolu-
ence  on  Computer  and  Communications  Security, ACM, tion:  A  survey.  In  Proceedings  of  the  9th  International
Abu  Dhabi,  United  Arab  Emirates,  pp. 506–519,  2017. Conference  on  Brain  Inspired  Cognitive  Systems, Spring-
DOI: 10.1145/3052973.3053009. er,  Xi′an,  China,  pp. 106–119,  2018.  DOI:  10.1007/978-3-
[49] Y. P. Dong, F. Z. Liao, T. Y. Pang, H. Su, J. Zhu, X. L.
 
030-00563-4_11.
Hu,  J.  G.  Li.  Boosting  adversarial  attacks  with  mo- [64] G. S. Dhillon, K. Azizzadenesheli, Z. C. Lipton, J. Bern-
 

mentum.  In  Proceedings  of  IEEE/CVF  Conference  on stein, J. Kossaifi, A. Khanna, A. Anandkumar. Stochast-


Computer  Vision  and  Pattern  Recognition,  IEEE,  Salt ic activation pruning for robust adversarial defense. ArX-
Lake  City,  USA,  pp. 9185–9193,  2018.  DOI:  10.1109/ iv: 1803.01442, 2018.
CVPR.2018.00957.
[65] C.  H.  Xie,  J.  Y.  Wang,  Z.  S.  Zhang,  Z.  Ren,  A.  Yuille.
 

[50] P.  Y.  Chen,  H.  Zhang,  Y.  Sharma,  J.  F  Yi,  C.  J.  Hsieh.
 

Mitigating  adversarial  effects  through  randomization.


ZOO: Zeroth order optimization based black-box attacks ArXiv: 1711.01991, 2017.
to deep neural networks without training substitute mod-
els. In  Proceedings  of  the  10th  ACM  Workshop  on  Artifi- [66] Y.  Song,  T.  Kim,  S.  Nowozin,  S.  Ermon,  N.  Kushman.
 

cial  Intelligence  and  Security,  ACM,  Dallas,  USA, Pixeldefend:  Leveraging  generative  models  to  under-
pp. 15–26, 2017. DOI: 10.1145/3128572.3140448. stand  and  defend  against  adversarial  examples.  ArXiv:
1710.10766, 2017.
[51] A. Ilyas, L. Engstrom, A. Athalye, J. Lin. Black-box ad-
 

versarial  attacks  with  limited  queries  and  information. [67] P. Samangouei, M. Kabkab, R. Chellappa. Defense-GAN:


 

ArXiv: 1804.08598, 2018. Protecting  classifiers  against  adversarial  attacks  using


generative models. ArXiv: 1805.06605, 2018.
[52] D.  Wierstra,  T.  Schaul,  T.  Glasmachers,  Y.  Sun,  J.
 

Peters,  J.  Schmidhuber.  Natural  evolution  strategies. [68] A.  van  den  Oord,  N.  Kalchbrenner,  O.  Vinyals,  L.  Espe-
 

Natural  evolution  strategies.  Journal  of  Machine  Learn- holt, A. Graves, K. Kavukcuoglu. Conditional image gen-


ing Research, vol. 15, no. 1, pp. 949–980, 2014. eration  with  PixelCNN  decoders.  In  Proceedings  of  the
30th  Conference  on  Neural  Information  Processing  Sys-
[53] M. Alzantot, Y. Sharma, S. Chakraborty, M. Srivastava.
tems,  Curran  Associates  Inc.,  Barcelona,  Spain,
 

Genattack: Practical black-box attacks with gradient-free
pp. 4790–4798, 2016.
optimization. ArXiv: 1805.11090, 2018.
[69] M.  Cisse,  P.  Bojanowski,  E.  Grave,  Y.  Dauphin,  N.
[54] C. W. Xiao, B. Li, J. Y. Zhu, W. He, M. Y. Liu, D. Song.
 

Usunier. Parseval networks: Improving robustness to ad-
 

Generating  adversarial  examples  with  adversarial  net-


versarial  examples.  In  Proceedings  of  the  34th  Interna-
works. ArXiv: 1801.02610, 2018.
tional  Conference  on  Machine  Learning,  Sydney,  Aus-
 
 176 International Journal of Automation and Computing 17(2), April 2020

tralia, pp. 854–863, 2017. adversarial images. ArXiv: 1608.00530, 2016.
[70] T.  Miyato,  S.  I.  Maeda,  M.  Koyama,  K.  Nakae,  S.  Ishii.
 

[88] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf,
 

Distributional  smoothing  with  virtual  adversarial  train- A. Smola. A kernel two-sample test. A kernel two-sample


ing. ArXiv: 1507.00677, 2015. test.  Journal  of  Machine  Learning  Research,  vol. 13,
pp. 723–773, 2012.
[71] S. X. Gu, L. Rigazio. Towards deep neural network archi-
 

tectures  robust  to  adversarial  examples.  ArXiv: [89] R. Feinman, R. R. Curtin, S. Shintre, A. B. Gardner. De-


 

1412.5068, 2014. tecting  adversarial  samples  from  artifacts.  ArXiv:


1703.00410, 2017.
[72] S.  Rifai,  P.  Vincent,  X.  Muller,  X.  Glorot,  Y.  Bengio.
 

Contractive  auto-encoders:  Explicit  invariance  during [90] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R.


 

feature  extraction.  In  Proceedings  of  the  28th  Interna- Salakhutdinov. Dropout: A simple way to prevent neural


tional  Conference  on  International  Conference  on  Ma- networks  from  overfitting.  Journal  of  Machine  Learning
chine  Learning,  Omnipress,  Bellevue,  USA,  pp. 833–840, Research, vol. 15, no. 1, pp. 1929–1958, 2014.
2011.
[91] Y.  Sharma,  P.  Y.  Chen.  Bypassing  feature  squeezing  by
 

[73] S.  Ioffe,  C.  Szegedy.  Batch  normalization:  Accelerating


 

increasing adversary strength. ArXiv: 1803.09868, 2018.
deep  network  training  by  reducing  internal  covariate
shift. ArXiv: 1502.03167, 2015. [92] A.  Fawzi,  S.  M.  Moosavi-Dezfooli,  P.  Frossard.  Robust-
 

ness  of  classifiers:  From  adversarial  to  random  noise.  In


[74] A. Shafahi, M. Najibi, A. Ghiasi, Z. Xu, J. Dickerson, C.
 

Proceedings  of  the  30th  Conference  on  Neural  Informa-


Studer, L. S. Davis, G. Taylor, T. Goldstein. Adversarial tion  Processing  Systems,  Barcelona,  Spain,
training for free! ArXiv: 1904.12843, 2019. pp. 1632–1640, 2016.
[75] D. H. Zhang, T. Y. Zhang, Y. P. Lu, Z. X. Zhu, B. Dong.
 

[93] S. M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, P. Frossard,
 

You only propagate once: Accelerating adversarial train- S. Soatto. Analysis of universal adversarial perturbations.
ing via maximal principle. ArXiv: 1905.00877, 2019. ArXiv: 1705.09554, 2017.
[76] L.  S.  Pontryagin.  Mathematical  Theory  of  Optimal  Pro-
 

[94] A.  Fawzi,  O.  Fawzi,  P.  Frossard.  Analysis  of  classifiers′
 

cesses, London, UK: Routledge, 2018. robustness  to  adversarial  perturbations.  Machine  Learn-


[77] A.  Raghunathan,  J.  Steinhardt,  P.  Liang.  Certified  de-
 
ing,  vol. 107,  no. 3,  pp. 481–508,  2018.  DOI:  10.1007/
fenses  against  adversarial  examples.  ArXiv:  1801.09344, s10994-017-5663-3.
2018. [95] A.  Shafahi,  W.  R.  Huang,  C.  Studer,  S.  Feizi,  T.  Gold-
 

[78] E.  Wong,  J.  Z.  Kolter.  Provable  defenses  against  ad-
 
stein.  Are  adversarial  examples  inevitable?  ArXiv:
versarial examples via the convex outer adversarial poly- 1809.02104, 2018.
tope. ArXiv: 1711.00851, 2017. [96] L.  Schmidt,  S.  Santurkar,  D.  Tsipras,  K.  Talwar,  A.
 

[79] M.  Hein,  M.  Andriushchenko.  Formal  guarantees  on  the


 
Madry. Adversarially robust generalization requires more
robustness  of  a  classifier  against  adversarial  manipula- data.  In  Proceedings  of  the  32nd  Conference  on  Neural
tion. In  Proceedings  of  the  31st  Conference  on  Neural  In- Information  Processing  Systems,  Montréal,  Canada,
formation  Processing  Systems,  Long  Beach,  USA, pp. 5014–5026, 2018.
pp. 2266–2276, 2017. [97] H. J. Dai, H. Li, T. Tian, X. Huang, L. Wang, J. Zhu, L.
 

[80] L.  Vandenberghe,  S.  Boyd.  Semidefinite  programming.


 
Song. Adversarial attack on graph structured data. ArX-
Semidefinite  programming.  SIAM  Review,  vol. 38,  no. 1, iv: 1806.02371, 2018.
pp. 49–95, 1996. DOI: 10.1137/1038003. [98] Y.  Ma,  S.  H.  Wang,  T.  Derr,  L.  F.  Wu,  J.  L.  Tang.  At-
 

[81] A. Raghunathan, J. Steinhardt, P. S. Liang. Semidefinite
 
tacking graph convolutional networks via rewiring. ArX-
relaxations  for  certifying  robustness  to  adversarial  ex- iv: 1906.03750, 2019.
amples. In  Proceedings  of  the  32nd  Conference  on  Neural [99] V.  Mnih,  K.  Kavukcuoglu,  D.  Silver,  A.  Graves,  I.  Ant-
 

Information  Processing  Systems,  Montréal,  Canada, onoglou,  D.  Wierstra,  M.  Riedmiller.  Playing  Atari  with
pp. 10877–10887, 2018. deep reinforcement learning. ArXiv: 1312.5602, 2013.
[82] E. Wong, F. Schmidt, J. H. Metzen, J. Z. Kolter. Scaling
 

[100] D. Züugner, S. Günnemann. Adversarial attacks on graph
 

provable adversarial defenses. In  Proceedings  of  the  32nd neural  networks  via  meta  learning.  ArXiv:  1902.08412,
Conference  on  Neural  Information  Processing  Systems, 2019.
Montréal, Canada, pp. 8400–8409, 2018.
[101] C.  Finn,  P.  Abbeel,  S.  Levine.  Model-agnostic  meta-
[83]
 

A. Sinha, H. Namkoong, J. Duchi. Certifying some distri-
learning for fast adaptation of deep networks. In Proceed-
 

butional  robustness  with  principled  adversarial  training.


ings  of  the  34th  International  Conference  on  Machine
ArXiv: 1710.10571, 2017.
Learning,  JMLR.org,  Sydney,  Australia,  pp. 1126–1135,
[84] K.  Grosse,  P.  Manoharan,  N.  Papernot,  M.  Backes,  P.
 
2017.
McDaniel. On the (statistical) detection of adversarial ex-
[102] A.  Bojchevski,  S.  Günnemann.  Adversarial  attacks  on
amples. ArXiv: 1702.06280, 2017.
 

node  embeddings  via  graph  poisoning.  ArXiv:


[85] Z. T. Gong, W. L. Wang, W. S. Ku. Adversarial and clean
 
1809.01093, 2018.
data are not twins. ArXiv: 1704.04960, 2017.
[103] B.  Perozzi,  R.  Al-Rfou,  S.  Skiena.  DeepWalk:  Online
 

[86] J.  H.  Metzen,  T.  Genewein,  V.  Fischer,  B.  Bischoff.  On
 
learning  of  social  representations.  In  Proceedings  of  the
detecting  adversarial  perturbations.  ArXiv:  1702.04267, 20th  ACM  SIGKDD  International  Conference  on  Know-
2017. ledge  Discovery  and  Data  Mining,  ACM,  New  York,
USA, pp. 701–710, 2014. DOI: 10.1145/2623330.2623732.
[87]   D.  Hendrycks,  K.  Gimpel.  Early  methods  for  detecting
 
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 177 

[104] F.  L.  Feng,  X.  N.  He,  J.  Tang,  T.  S.  Chua.  Graph  ad-
 

[121] H. C. Liu, T. Derr, Z. T. Liu, J. L Tang. Say what I want:
 

versarial  training:  Dynamically  regularizing  based  on Towards the dark side of neural dialogue models. ArXiv:


graph structure. ArXiv: 1902.08226, 2019. 1909.06044, 2019.
[105] K. D. Xu, H. G. Chen, S. J. Liu, P. Y. Chen, T. W. Weng,
  [122] M.  Sharif,  S.  Bhagavatula,  L.  Bauer,  M.  K.  Reiter.  Ac-
 

M.  Y.  Hong,  X.  Lin.  Topology  attack  and  defense  for cessorize to a crime: Real and stealthy attacks on state-of-
graph  neural  networks:  An  optimization  perspective. the-art face recognition. In  Proceedings  of  the  ACM  SIG-
ArXiv: 1906.04214, 2019. SAC  Conference  on  Computer  and  Communications  Se-
curity, ACM, Vienna, Austria, pp. 1528–1540, 2016. DOI:
[106] N. Carlini, D. Wagner. Audio adversarial examples: Tar-
10.1145/2976749.2978392.
 

geted  attacks  on  speech-to-text.  In  Proceedings  of  IEEE


Security  and  Privacy  Workshops,  IEEE,  San  Francisco, [123] O. M. Parkhi, A. Vedaldi, A. Zisserman. Deep face recog-
 

USA, pp. 1–7, 2018. DOI: 10.1109/SPW.2018.00009. nition. Machine Learning 2015.
[107] A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos,
 

[124] C. H. Xie, J. Y. Wang, Z. S. Zhang, Y. Y. Zhou, L. X. Xie,
 

E.  Elsen,  R.  Prenger,  S.  Satheesh,  S.  Sengupta,  A. A.  Yuille.  Adversarial  examples  for  semantic  segmenta-
Coates,  A.  Y.  Ng.  Deep  speech:  Scaling  up  end-to-end tion  and  object  detection.  In  Proceedings  of  IEEE  Inter-
speech recognition. ArXiv: 1412.5567, 2014. national  Conference  on  Computer  Vision,  IEEE,  Venice,
Italy,  pp. 1378–1387,  2017.  DOI:  10.1109/ICCV.2017.
[108] T.  Miyato,  A.  M.  Dai,  I.  Goodfellow.  Adversarial  train-
153.
 

ing methods for semi-supervised text classification. ArX-
iv: 1605.07725, 2016. [125] J. H. Metzen, M. C. Kumar, T. Brox, V. Fischer. Univer-
 

sal adversarial perturbations against semantic image seg-
[109] T.  Mikolov,  I.  Sutskever,  K.  Chen,  G.  S.  Corrado,  J.
mentation. In  Proceedings  of  IEEE  International  Confer-
 

Dean.  Distributed  representations  of  words  and  phrases


ence  on  Computer  Vision,  IEEE,  Venice,  Italy,
and their compositionality. In  Proceedings  of  the  26th  In-
pp. 2774–2783, 2017. DOI: 10.1109/ICCV.2017.300.
ternational  Conference  on  Neural  Information  Processing
Systems,  Curran  Associates  Inc.,  Lake  Tahoe,  USA, [126] S.  S.  Li,  A.  Neupane,  S.  Paul,  C.  Y.  Song,  S.  V.  Krish-
 

pp. 3111–3119, 2013. namurthy,  A.  K.  R.  Chowdhury,  A.  Swami.  Adversarial


perturbations  against  real-time  video  classification  sys-
[110] B. Liang, H. C. Li, M. Q. Su, P. Bian, X. R. Li, W. C. Shi.
tems. ArXiv: 1807.00458, 2018.
 

Deep text classification can be fooled. ArXiv: 1704.08006,
2017. [127] J. Kos, I. Fischer, D. Song. Adversarial examples for gen-
 

erative models. In  Proceedings  of  IEEE  Security  and  Pri-


[111] J.  Gao,  J.  Lanchantin,  M.  L.  Soffa,  Y.  J.  Qi.  Black-box
vacy  Workshops,  IEEE,  San  Francisco,  USA,  pp. 36–42,
 

generation  of  adversarial  text  sequences  to  evade  deep


2018. DOI: 10.1109/SPW.2018.00014.
learning  classifiers.  In  Proceedings  of  IEEE  Security  and
Privacy  Workshops,  IEEE,  San  Francisco,  USA, [128] D.  P.  Kingma,  M.  Welling.  Auto-encoding  variational
 

pp. 50–56, 2018. DOI: 10.1109/SPW.2018.00016. Bayes. ArXiv: 1312.6114, 2013.
[112] J. F. Li, S. L. Ji, T. Y. Du, B. Li, T. Wang. Textbugger:
 

[129] A.  B.  L.  Larsen,  S.  K.  Sønderby,  H.  Larochelle,  O.  Win-
 

Generating  adversarial  text  against  real-world  applica- ther. Autoencoding beyond pixels using a learned similar-


tions. ArXiv: 1812.05271, 2018. ity metric. ArXiv: 1512.09300, 2015.
[113] S. Samanta, S. Mehta. Towards crafting text adversarial
 

[130] K.  Grosse,  N.  Papernot,  P.  Manoharan,  M.  Backes,  P.


 

samples. ArXiv: 1707.02812, 2017. McDaniel. Adversarial perturbations against deep neural
networks  for  malware  classification.  ArXiv:  1606.04435,
[114] M.  Iyyer,  J.  Wieting,  K.  Gimpel,  L.  Zettlemoyer.  Ad-
2016.
 

versarial  example  generation  with  syntactically  con-


trolled paraphrase networks. ArXiv: 1804.06059, 2018. [131] D.  Arp,  M.  Spreitzenbarth,  H.  Gascon,  K.  Rieck.
 

DREBIN:  Effective  and  explainable  detection  of  android


[115] Q. Lei, L. F. Wu, P. Y. Chen, A. G. Dimakis, I. S. Dhil-
malware  in  your  pocket.  In Proceedings  of  Symposium
 

lon,  M.  Witbrock.  Discrete  attacks  and  submodular  op-


Network  Distributed  System  Security,  Internet  Society,
timization with applications to text classification. ArXiv:
San Diego, USA, 2014.
1812.00151, 2018.
[132] W.  W.  Hu,  Y.  Tan.  Generating  adversarial  malware  ex-
[116] R.  Jia,  P.  Liang.  Adversarial  examples  for  evaluating
 

amples  for  black-box  attacks  based  on  GAN.  ArXiv:


 

reading  comprehension  systems.  ArXiv:  1707.07328,


1702.05983, 2017.
2017.
[133] H. S. Anderson, J. Woodbridge, B. Filar. DeepDGA: Ad-
[117] Y.  Belinkov,  Y.  Bisk.  Synthetic  and  natural  noise  both
 

versarially-tuned  domain  generation  and  detection.  In


 

break  neural  machine  translation.  ArXiv:  1711.02173,


Proceedings  of  ACM  Workshop  on  Artificial  Intelligence
2017.
and  Security,  ACM,  Vienna,  Austria,  pp. 13–21,  2016.
[118] M. H. Cheng, J. F. Yi, H. Zhang, P. Y. Chen, C. J. Hsieh.
 
DOI: 10.1145/2996758.2996767.
Seq2Sick:  Evaluating  the  robustness  of  sequence-to-se-
[134] T. Chugh, A. K. Jain. Fingerprint presentation attack de-
quence  models  with  adversarial  examples.  ArXiv:
 

tection: Generalization and efficiency. ArXiv: 1812.11574,
1803.01128, 2018.
2018.
[119] T. Niu, M. Bansal. Adversarial over-sensitivity and over-
[135] T.  Chugh,  K.  Cao,  A.  K.  Jain.  Fingerprint  spoof  buster:
 

stability  strategies  for  dialogue  models.  ArXiv:


 

Use of minutiae-centered patches.  IEEE  Transactions  on


1809.02079, 2018.
Information  Forensics  and  Security,  vol. 13,  no. 9,
[120] T. X. He, J. Glass. Detecting egregious responses in neur-
 
pp. 2190–2202, 2018. DOI: 10.1109/TIFS.2018.2812193.
al sequence-to-sequence models. ArXiv: 1809.04113, 2018.
[136]  S.  Huang,  N.  Papernot,  I.  Goodfellow,  Y.  Duan,  P.  Ab-
 
 178 International Journal of Automation and Computing 17(2), April 2020

beel.  Adversarial  attacks  on  neural  network  policies.      E-mail: [email protected]


ArXiv: 1702.02284, 2017.
 
[137] J.  Schulman,  S.  Levine,  P.  Moritz,  M.  I.  Jordan,  P.  Ab-
 
Hui Liu  is  a  research  associate  at
beel.  Trust  region  policy  optimization.  In  Proceedings  of Michigan  State  University.  Before  joining
the  31st  International  Conference  on  Machine  Learning, MSU,  she  received  her  Ph. D.  degree  of
JMLR, Lille, France, pp. 1889–1897, 2015. Electrical  Engineering  in  Southern  Meth-
odist  University,  USA  under  the  supervi-
[138] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Harley, T.
 

sion by Dr. Dinesh Rajen.
P.  Lillicrap,  D.  Silver,  K.  Kavukcuoglu.  Asynchronous       Her  research  interests  include  signal
methods  for  deep  reinforcement  learning.  In  Proceedings processing,  wireless  communication,  and
of  the  33rd  International  conference  on  Machine  Learn- deep learning related topics.
ing, PMLR, New York, USA, pp. 1928–1937, 2016.      E-mail: [email protected]
   
Han Xu is a second year Ph. D. student of Ji-Liang Tang is an assistant professor in
computer  science  in  DSE  Lab,  Michigan the  computer  science  and  engineering  de-
State  University,  USA.  He  is  under  super- partment  at  Michigan  State  University
vision by Dr. Ji-Liang Tang. since  Fall  2016.  Before  that,  he  was  a  re-
      His  research  interests  include  deep search scientist in Yahoo Research and re-
learning  safety  and  robustness,  especially ceived  his  Ph. D.  degree  from  Arizona
studying  the  problems  related  to  ad- State University in 2015. He was the recip-
versarial examples. ients of 2019 NSF Career Award, the 2015
      E-mail:  [email protected]  (Correspond- KDD  Best  Dissertation  runner  up  and  6
ing author) Best  Paper  Awards  (or  runner-ups)  including  WSDM  2018  and
     ORCID iD: 0000-0002-4016-6748
KDD  2016.  He  serves  as  conference  organizers  (e.g.,  KDD,
WSDM  and  SDM)  and  journal  editors  (e.g.,  TKDD).  He  has
 
Yao Ma received the B. Sc. degree in ap- published his research in highly ranked journals and top confer-
plied  mathematics  at  Zhejiang  University, ence proceedings, which received thousands of citations and ex-
China in 2015, the M. Sc. degree in statist- tensive media coverage.
ics, probabilities and operation research at       His  research  interests  include  social  computing,  data  mining
Eindhoven  University  of  Technology,  the and machine learning and their applications in education.
Netherlands in 2016. He is now a Ph. D. de-      E-mail: [email protected]
gree  candidate  of  Department  of  Com-
puter  Science  and  Engineering,  Michigan  
State  University,  USA.  His  Ph. D.  advisor Anil K. Jain  (Ph. D.,  1973,  Ohio  State
University; B. Tech., IIT Kanpur) is a Uni-
is Dr. Jiliang Tang.
versity  Distinguished  Professor  at
     His research interests include graph neural networks and their
Michigan  State  University  where  he  con-
related safety issues.
ducts  research  in  pattern  recognition,  ma-
      E-mail: [email protected]
chine  learning,  computer  vision,  and  bio-
metrics  recognition.  He  was  a  member  of
 
Hao-Chen Liu  is  currently  a  Ph. D.  stu- the  United  States  Defense  Science  Board
dent  at  the  Department  of  Computer  Sci- and  Forensics  Science  Standards  Board.
ence  and  Engineering  at  Michigan  State His prizes include Guggenheim, Humboldt, Fulbright, and King-
University,  under  the  supervision  of  Dr. Sun Fu Prize. For advancing pattern recognition, Jain was awar-
Jiliang Tang. He is a member of Data Sci- ded  Doctor  Honoris  Causa  by  Universidad  Autónoma  de  Mad-
ence and Engineering (DSE) Lab. rid.  He  was  Editor-in-Chief  of  the  IEEE  Transactions  on  Pat-
      His  research  interests  include  natural tern Analysis and Machine Intelligence and is a Fellow of ACM,
language processing problems, especially in IEEE,  AAAS,  and  SPIE.  Jain  has  been  assigned  8  U.S.  and
the  robustness,  fairness  of  dialogue  sys- Korean patents and is active in technology transfer for which he
tems. was  elected  to  the  National  Academy  of  Inventors.  Jain  is  a
     E-mail: [email protected] member  of  the  U.S.  National  Academy  of  Engineering  (NAE),
foreign member of the Indian National Academy of Engineering
  (INAE),  a  member  of  The  World  Academy  of  Science  (TWAS)
Debayan Deb  is  a  Ph. D.  degree  candid-
ate in the Biometrics Lab, Michigan State and  a  foreign  member  of  the  Chinese  Academy  of  Sciences
University,  USA  under  the  supervision  of (CAS).
Dr.  Anil  K.  Jain.  Before  joining  the  Bio-       His  research  interests  include  pattern  recognition,  machine
metrics  Lab  of  MSU,  he  graduated  from learning, computer vision, and biometrics recognition.
Michigan  State  University  with  a  Bachel-      E-mail: [email protected]
or Degree of Computer Science and Engin-
eering.
     His research interests include face recog-
nition and computer vision tasks.

You might also like