Adversarial Attacks and Defenses
Adversarial Attacks and Defenses
Adversarial Attacks and Defenses
Journal of Automation and Computing 17(2), April 2020, 151-178
DOI: 10.1007/s11633-019-1211-x
Abstract: Deep neural networks (DNN) have achieved unprecedented success in numerous machine learning tasks in various domains.
However, the existence of adversarial examples raises our concerns in adopting deep learning to safety-critical applications. As a result,
we have witnessed increasing interests in studying attack and defense mechanisms for DNN models on different data types, such as im-
ages, graphs and text. Thus, it is necessary to provide a systematic and comprehensive overview of the main threats of attacks and the
success of corresponding countermeasures. In this survey, we review the state of the art algorithms for generating adversarial examples
and the countermeasures against adversarial examples, for three most popular data types, including images, graphs and text.
Keywords: Adversarial example, model safety, robustness, defenses, deep learning.
1 Introduction which can be formally defined as – “Adversarial ex-
amples are inputs to machine learning models that an at-
Deep neural networks (DNN) have become increas- tacker intentionally designed to cause the model to make
ingly popular and successful in many machine learning mistakes”. In the image classification domain, these ad-
tasks. They have been deployed in different recognition versarial examples are intentionally synthesized images
problems in the domains of images, graphs, text and which look almost exactly the same as the original im-
speech, with remarkable success. In the image recogni- ages (see Fig. 1), but can mislead the classifier to provide
tion domain, they are able to recognize objects with near- wrong prediction outputs. For a well-trained DNN image
human level accuracy[1, 2]. They are also used in speech classifier on the MNIST dataset, almost all the digit
recognition[3], natural language processing[4] and for play- samples can be attacked by an imperceptible perturba-
ing games[5]. tion, added on the original image. Meanwhile, in other
Because of these accomplishments, deep learning tech- application domains involving graphs, text or audio, sim-
niques are also applied in safety-critical tasks. For ex- ilar adversarial attacking schemes also exist to confuse
ample, in autonomous vehicles, deep convolutional neur- deep learning models. For example, perturbing only a
al networks (CNNs) are used to recognize road signs[6]. couple of edges can mislead graph neural networks[10], and
The machine learning technique used here is required to inserting typos to a sentence can fool text classification or
be highly accurate, stable and reliable. But, what if the dialogue systems[11]. As a result, the existence of ad-
CNN model fails to recognize the “STOP” sign by the versarial examples in all application fields has cautioned
roadside and the vehicle keeps going? It will be a danger- researchers against directly adopting DNNs in safety-crit-
ous situation. Similarly, in financial fraud detection sys- ical machine learning tasks.
tems, companies frequently use graph convolutional net- To deal with the threat of adversarial examples, stud-
works (GCNs)[7] to decide whether their customers are
152 International Journal of Automation and Computing 17(2), April 2020
ies have been published with the aim of finding counter- he know the classifier′s structure, its parameters or the
measures to protect deep neural networks. These ap- training set used for classifier training?
proaches can be roughly categorized to three main types: 3) Victim models (Section 2.1.3)
1) Gradient masking[12, 13]: Since most attacking al- What kind of deep learning models do adversaries usu-
gorithms are based on the gradient information of the ally attack? Why are adversaries interested in attacking
classifiers, masking or obfuscating the gradients will con- these models?
fuse the attack mechanisms. 2) Robust optimization[14, 15]: 4) Security evaluation (Section 2.2)
These studies show how to train a robust classifier that How can we evaluate the safety of a victim model
can correctly classify the adversarial examples. 3) Ad- when faced with adversarial examples? What is the rela-
versary detection[16, 17]: The approaches attempt to check tionship and difference between these security metrics
whether a sample is benign or adversarial before feeding and other model goodness metrics, such as accuracy or
it to the deep learning models. It can be seen as a meth- risks?
od of guarding against adversarial examples. These meth-
ods improve DNN′s resistance to adversarial examples. 2.1 Threat model
In addition to building safe and reliable DNN models,
studying adversarial examples and their countermeasures 2.1.1 Adversary′s goal
is also beneficial for us to understand the nature of DNNs 1) Poisoning attack versus evasion attack
and consequently improve them. For example, adversari- Poisoning attacks refer to the attacking algorithms
al perturbations are perceptually indistinguishable to hu- that allow an attacker to insert/modify several fake
man eyes but can evade DNN′s detection. This suggests samples into the training database of a DNN algorithm.
that the DNN′s predictive approach does not align with These fake samples can cause failures of the trained clas-
human reasoning. There are works[9, 18] to explain and in- sifier. They can result in the poor accuracy[19], or wrong
terpret the existence of adversarial examples of DNNs, prediction on some given test samples[10]. This type of at-
which can help us gain more insight into DNN models. tacks frequently appears in the situation where the ad-
In this review, we aim to summarize and discuss the versary has access to the training database. For example,
main studies dealing with adversarial examples and their web-based repositories and “honeypots” often collect mal-
countermeasures. We provide a systematic and compre- ware examples for training, which provides an opportun-
hensive review on the start-of-the-art algorithms from im- ity for adversaries to poison the data.
ages, graphs and text domain, which gives an overview of In evasion attacks, the classifiers are fixed and usu-
the main techniques and contributions to adversarial at- ally have good performance on benign testing samples.
tacks and defenses. The adversaries do not have authority to change the clas-
The main structure of this survey is as follows: In Sec- sifier or its parameters, but they craft some fake samples
tion 2, we introduce some important definitions and con- that the classifier cannot recognize. In other words, the
cepts which are frequently used in adversarial attacks and adversaries generate some fraudulent examples to evade
their defenses. It also gives a basic taxonomy of the types detection by the classifier. For example, in autonomous
of attacks and defenses. In Sections 3 and 4, we discuss driving vehicles, sticking a few pieces of tapes on the stop
main attack and defense techniques in the image classific- signs can confuse the vehicle′s road sign recognizer[20].
ation scenario. We use Section 5 to briefly introduce some 2) Targeted attack versus non-targeted attack
studies which try to explain the phenomenon of ad- In targeted attack, when the victim sample (x, y) is
versarial examples. Sections 6 and 7 review the studies in given, where x is feature vector and y ∈ Y is the ground
graph and text data, respectively. truth label of x, the adversary aims to induce the classifi-
er to give a specific label t ∈ Y to the perturbed sample
2 Definitions and notations x′ . For example, a fraudster is likely to attack a financial
company′s credit evaluation model to disguise himself as
In this section, we give a brief introduction to the key a highly credible client of this company.
components of model attacks and defenses. We hope that If there is no specified target label t for the victim
our explanations can help our audience to understand the sample x, the attack is called non-targeted attack. The
main components of the related works on adversarial at- adversary only wants the classifier to predict incorrectly.
tacks and their countermeasures. By answering the fol- 2.1.2 Adversary′s knowledge
lowing questions, we define the main terminology: 1) White-box attack
1) Adversary's goal (Section 2.1.1) In a white-box setting, the adversary has access to all
What is the goal or purpose of the attacker? Does he the information of the target neural network, including its
want to misguide the classifier′s decision on one sample, architecture, parameters, gradients, etc. The adversary
or influence the overall performance of the classifier? can make full use of the network information to carefully
2) Adversary's knowledge (Section 2.1.2) craft adversarial examples. White-box attacks have been
What information is available to the attacker? Does extensively studied because the disclosure of model archi-
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 153
tecture and parameters helps people understand the ers of artificial neurons. In each layer, the neurons take
weakness of DNN models clearly and it can be analyzed the input from previous layers, process it with the activa-
mathematically. As stated by Tramer et al.[21], security tion function and send it to the next layer; the input of
against white-box attacks is the property that we desire first layer is sample x, and the (softmax) output of last
machine learning (ML) models to have. layer is the score F (x). An m-layer fully connected neur-
2) Black-box attack al network can be formed as
In a black-box attack setting, the inner configuration
of DNN models is unavailable to adversaries. Adversaries z (0) = x; z (l+1) = σ(W l z l + bl ).
can only feed the input data and query the outputs of the
models. They usually attack the models by keeping feed- One thing to note is that, the back-propagation al-
ing samples to the box and observing the output to ex- ∂F (x; θ)
gorithm helps calculate , which makes gradient
ploit the model′s input-output relationship, and identity ∂θ
descent effective in learning parameters. In adversarial
its weakness. Compared to white-box attacks, black-box
learning, back-propagation also facilitates the calculation
attacks are more practical in applications because model ∂F (x; θ)
designers usually do not open source their model para- of the term: , representing the output′s response
∂x
meters for proprietary reasons. to a change in input. This term is widely used in the
3) Semi-white (gray) box attack studies to craft adversarial examples.
In a semi-white box or gray box attack setting, the at- b) Convolutional neural networks
tacker trains a generative model for producing adversari- In computer vision tasks, convolutional neural net-
al examples in a white-box setting. Once the generative works[1] is one of the most widely used models. CNN
model is trained, the attacker does not need victim mod- models aggregate the local features from the image to
el anymore, and can craft adversarial examples in a learn the representations of image objects. CNN models
black-box setting. can be viewed as a sparse-version of fully connected neur-
2.1.3 Victim models al networks: Most of the weights between layers are zero.
We briefly summarize the machine learning models Its training algorithm or gradients calculation can also be
which are susceptible to adversarial examples, and some inherited from fully connected neural networks.
popular deep learning architectures used in image, graph c) Graph convolutional networks (GCN)
and text data domains. In our review, we mainly discuss The work of graph convolutional networks introduced
studies of adversarial examples for deep neural networks. by Kipf and Welling[7] became a popular node classifica-
1) Conventional machine learning models tion model for graph data. The idea of graph convolution-
For conventional machine learning tools, there is a al networks is similar to CNN: It aggregates the informa-
long history of studying safety issues. Biggio et al.[22] at- tion from neighbor nodes to learn representations for each
tack support vector machine (SVM) classifiers and fully- node v , and outputs the score F (v, X) for prediction:
connected shallow neural networks for the MNIST data-
set. Barreno et al.[23] examine the security of SpamBayes, H (0) = X; H (l+1) = σ(ÂH (l) W l )
a Bayesian method based spam detection software. In
[24], the security of Naive Bayes classifiers is checked. where X denotes the input graph′s feature matrix, and Â
Many of these ideas and strategies have been adopted in depends on graph degree matrix and adjacency matrix.
the study of adversarial attacks in deep neural networks. d) Recurrent neural networks (RNN)
2) Deep neural networks Recurrent neural networks are very useful for tack-
Different from traditional machine learning tech- ling sequential data. As a result, they are widely used in
niques which require domain knowledge and manual fea- natural language processing. The RNN models, especially
ture engineering, DNNs are end-to-end learning al- long short term memory based models (LSTM)[4], are able
gorithms. The models use raw data directly as input to to store the previous time information in memory, and
the model, and learn objects' underlying structures and exploit useful information from previous sequence for
attributes. The end-to-end architecture of DNNs makes it next-step prediction.
easy for adversaries to exploit their weakness, and gener-
ate high-quality deceptive inputs (adversarial examples). 2.2 Security evaluation
Moreover, because of the implicit nature of DNNs, some
of their properties are still not well understood or inter- We also need to evaluate the model′s resistance to ad-
pretable. Therefore, studying the security issues of DNN versarial examples. “Robustness” and “adversarial risk”
models is necessary. Next, we will briefly introduce some are two terms used to describe this resistance of DNN
popular victim deep learning models which are used as models to one single sample, and the total population, re-
“benchmark” models in attack/defense studies. spectively.
a) Fully-connected neural networks (FC) 2.2.1 Robustness
Fully-connected neural networks are composed of lay- Definition 1. Minimal perturbation: Given the classi-
154 International Journal of Automation and Computing 17(2), April 2020
fier F and data (x, y), the adversarial perturbation has low the distribution D . Thus, the studies on adversarial
the least norm (the most unnoticeable perturbation): examples are different from these on model generaliza-
tion. Moreover, a number of studies reported the relation
δmin = arg min ||δ|| s.t. F (x + δ) ̸= y. between these two properties[25−28]. From our clarification,
δ
we hope that our audience get the difference and relation
Here, || · || usually refers to lp norm. between risk and adversarial risk, and the importance of
studying adversarial countermeasures.
Definition 2. Robustness: The norm of minimal per-
turbation:
2.3 Notations
r(x, F ) = ||δmin ||.
With the aforementioned definitions, Table 1 lists the
Definition 3. Global robustness: The expectation of notations which will be used in the subsequent sections.
robustness over the whole population D :
Table 1 Notations
ρ(F ) = E r(x, F ).
x∼D Notations Description
x Victim data sample
The minimal perturbation can find the adversarial ex-
′ Perturbed data sample
ample which is most similar to x under the model F . x
Therefore, the larger r(x, F ) or ρ(F ) is, the adversary δ Perturbation
needs to sacrifice more similarity to generate adversarial
Bϵ (x) lp -distance neighbor ball around x with radius ϵ
samples, implying that the classifier F is more robust or
safe. D Natural data distribution
Set of possible labels. Usually we assume there are m
Y
xadv = arg max L(x′ , F ) s.t. ||x′ − x|| ≤ ϵ. labels
x′
C Classifier whose output is a label: C(x) = y
Definition 5. Adversarial loss: The loss value for the F DNN model which outputs a score vector: F (x) ∈ [0, 1]m
most-adversarial example:
Logits: last layer outputs before softmax:
Z F (x) = sof tmax(Z(x))
′
Ladv (x) = L(xadv ) = max
′
L(θ, x , y). σ
||x −x||<ϵ Activation function used in neural networks
θ Parameters of the model F
Definition 6. Global adversarial loss: The expecta-
tion of the loss value on xadv over the data distribution L Loss function for training. We simplify L(F (x), y) in the
form L(θ, x, y).
D:
used to generate adversarial image examples for evasion versarial example x′ , with the objective:
attack (white-box, black-box, grey-box, physical-world at-
tack), and poisoning attack settings. Note that we also min ||x − x′ ||22
summarize all the attack methods in Table A in Appendix A. (2)
s.t. C(x′ ) = t and x′ ∈ [0, 1]m .
3.1 White-box attacks Szegedy et al. approximately solve this problem by in-
troducing the loss function, which results the following
Generally, in a white-box attack setting, when the objective:
classifier C (model F ) and the victim sample (x, y) are
given to the attacker, his goal is to synthesize a fake im- min c||x − x′ ||22 + L(θ, x′ , t), s.t. x′ ∈ [0, 1]m .
age x′ perceptually similar to original image x but that
can mislead the classifier C to give wrong prediction res- In the optimization objective of this problem, the first
ults. It can be formulated as term imposes the similarity between x′ and x. The second
term encourages the algorithm to find x′ which has a
find x′ satisfying ||x′ − x|| ≤ ϵ, such that C(x′ ) = t ̸= y small loss value to label t , so the classifier C will be very
likely to predict x′ as t . By continuously changing the
where || · || measures the dissimilarity between x′ and x, value of constant c, they can find an x′ which has minim-
which is usually lp norm. Next, we will go through main um distance to x, and at the same time fool the classifier
methods to realize this formulation. C . To solve this problem, they implement the L-BFGS[30]
3.1.1 Biggio′s attack algorithm.
Biggio et al.[22] firstly generates adversarial examples 3.1.3 Fast gradient sign method (FGSM)
on MNIST data set targeting conventional machine learn- Goodfellow et al.[9] introduced an one-step method to
ing classifiers like SVMs and 3-layer fully-connected neur- fast generate adversarial examples. Their formulation is
al networks.
It optimizes the discriminant function to mislead the x′ = x + ϵsgn(∇x L(θ, x, y)), non-target
classifier. For example, on MNIST dataset, for a linear
x′ = x − ϵsgn(∇x L(θ, x, t)), target on t.
SVM classifier, its discriminant function g(x) = ⟨w, x⟩+ b,
will mark a sample x with positive value g(x) > 0 to be in For targeted attack setting, this formulation can be
class “3”, and x with g(x) ≤ 0 to be in class “not 3”. An seen as a one-step of gradient descent to solve the problem:
example of this attack is in Fig. 2.
min L(θ, x′ , t)
(3)
linearizes the decision boundary hyperplane using Taylor x0 = x; xt+1 = Clipx,ϵ (xt + αsgn(∇x L(θ, xt , y))).
expansion F3′ = {x : f (x) ≈ f (x0 )+⟨∇x f (x0 ) · (x−x0 )⟩ = 0},
and calculates the orthogonal vector ω from x0 to plane Here, Clip denotes the function to project its argu-
F3′ . This vector ω can be the perturbation that makes x0 ment to the surface of x′s ϵ -neighbor ball Bϵ (x) :
go beyond the decision boundary F3. By moving along {x′ : ||x′ − x||∞ ≤ ϵ}. The step size α is usually set to be
the vector ω, the algorithm is able to find the adversarial relatively small (e.g., 1 unit of pixel change for each
example x′0 that is classified to class 3. pixel), and step numbers guarantee that the perturba-
ϵ
tion can reach the border (e.g., step = + 10 ). This iter-
α
1
ative attacking method is also known as projected gradi-
ent method (PGD) if the algorithm is added by a ran-
dom initialization on x, used in work [14].
3
This BIM (or PGD) attack heuristically searches the
samples x′ which have the largest loss value in the l∞
(x0′ , 3) ball around the original sample x. This kind of adversari-
al examples are called “most-adversarial” examples: They
are the sample points which are most aggressive and
(x0, 4) most-likely to fool the classifiers, when the perturbation
2
intensity (its lp norm) is limited. Finding these adversari-
al examples is helpful to find the weaknesses of deep
learning models.
3.1.7 Carlini & Wagner′s attack
Fig. 3 Decision boundaries: the hyperplane F∞ (F∈ or F∋) Carlini and Wagner′s attack[34] counterattacks the de-
separates the data points belonging to class 4 and class 1 (class 2
or 3). The sample x0 crosses the decision boundary F∋, so the fense strategy[12] which were shown to be successful
perturbed data x′0 is classified as class 3. (Image credit: Moosavi- against FGSM and L-BFGS attacks. C&W′s attack aims
Dezfooli et al.[32])
to solve the same problem as defined in L-BFGS attack
(Section 3.1.2), namely trying to find the minimally-dis-
The experiments of DeepFool[32] shows that for com- torted perturbation (2).
mon DNN image classifiers, almost all test samples are The authors solve the problem (2) by instead solving:
very close to their decision boundary. For a well-trained
LeNet classifier on MNIST dataset, over 90% of test min ||x − x′ ||22 + c · f (x′ , t), s.t. x′ ∈ [0, 1]m
samples can be attacked by small perturbations whose l∞
norm is below 0.1 where the total range is [0, 1]. This where f is defined as f (x′ , t) = (maxi̸=t Z(x′ )i − Z(x′ )t )+.
suggests that the DNN classifiers are not robust to small Minimizing f (x′ , t) encourages the algorithm to find an x′
perturbations. that has larger score for class t than any other label, so
3.1.5 Jacobian-based saliency map attack that the classifier will predict x′ as class t . Next, applying
Jacobian-based saliency map attack (JSMA)[33] intro- a line search on constant c, we can find the x′ that has
duced a method based on calculating the Jacobian mat- the least distance to x.
rix of the score function F . It can be viewed as a greedy The function f (x, y) can also be viewed as a loss func-
tion for data (x, y): It penalizes the situation where there
attack algorithm by iteratively manipulating the pixel
are some labels i with scores Z(x)i larger than Z(x)y. It
which is the most influential to the model output.
can also be called margin loss function.
The authors used the Jacobian matrix JF(x) =
{ } The only difference between this formulation and the
∂F (x) ∂Fj (x)
= to model F (x)′s change in re- one in L-BFGS attack (Section 3.1.2) is that C&W′s at-
∂x ∂xi i×j tack uses margin loss f (x, t) instead of cross entropy loss
sponse to the change of its input x. For a targeted attack
L(x, t). The benefit of using margin loss is that when
setting where the adversary aims to craft an x′ that is
C(x′ ) = t, the margin loss value f (x′ , t) = 0, the algor-
classified to the target class t , they repeatedly search and
ithm will directly minimize the distance from x′ to x.
manipulate pixel xi whose increase (decrease) will cause
∑ This procedure is more efficient for finding the minimally
Ft (x) to increase or decrease j̸=t Fj (x). As a result, for distorted adversarial example.
x, the model will give it the largest score to label t . The authors claim their attack is one of the strongest
3.1.6 Basic iterative method (BIM)/Projected attacks, breaking many defense strategies which were
gradient descent (PGD) attack shown to be successful. Thus, their attacking method can
The basic iterative method was first introduced by be used as a benchmark to examine the safety of DNN
Kurakin et al.[15, 31] It is an iterative version of the one- classifiers or the quality of other adversarial examples.
step attack FGSM in Section 3.1.3. In a non-targeted set- 3.1.8 Ground truth attack
ting, it gives an iterative formulation to craft x′ : Attacks and defenses keep improving to defeat each
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 157
other. In order to end this stalemate, the work of Carlini samples in the ILSVRC 2012[43] dataset under a ResNet-
et al.[35] tries to find the “provable strongest attack”. It 152[2] classifier.
can be seen as a method to find the theoretical minim- The existence of “universal” adversarial examples re-
ally-distorted adversarial examples. veals a DNN classifier′s inherent weakness on all of the
This attack is based on Reluplex[36], an algorithm for input samples. As claimed in work [42], it may suggest
verifying the properties of neural networks. It encodes the the property of geometric correlation among the high-di-
model parameters F and data (x, y) as the subjects of a mensional decision boundary of classifiers.
linear-like programming system, and then solve the sys- 3.1.11 Spatially transformed attack
tem to check whether there exists an eligible sample x′ in Traditional adversarial attack algorithms directly
x′s neighbor Bϵ (x) that can fool the model. If we keep re- modify the pixel value of an image, which changes the
ducing the radius ϵ of search region Bϵ (x) until the sys- image′s color intensity. Spatial attack[44] devises another
tem determines that there does not exist such an x′ that method, called a spatially transformed attack. They per-
can fool the model, the last found adversarial example is turb the image by doing slight spatial transformation:
called the ground truth adversarial example, because it They translate, rotate and distort the local image fea-
has been proved to have least dissimilarity with x. tures slightly. The perturbation is small enough to evade
The ground-truth attack is the first work to seriously human inspection but can fool the classifiers. One ex-
calculate the exact robustness (minimal perturbation) of ample is in Fig. 4.
classifiers. However, this method involves using a satis-
0 0
ages directly to the machine learning model. However, the region to perturb (l1 attacks render sparse perturba-
this is not always the case for some scenarios, like those tion, which helps to find attack location). These regions
that use cameras, microphones or other sensors to re- will later be the location of stickers. 2) Concentrating on
ceive the signals as input. In this case, can we still at- the regions found in step 1, use an l2-norm based attack
tack these systems by generating physical-world ad- to generate the color for the stickers. 3) Print out the
versarial objects? Recent works show such attacks do ex- perturbation found in Steps 1 and 2, and stick them on
ist. For example, the work [20] attached stickers to road road sign. The perturbed stop sign can confuse an
signs that can severely threaten autonomous car′s sign re- autonomous vehicle from any distance and viewpoint.
cognizer. These kinds of adversarial objects are more de- 3.2.3 Athalye′s 3D adversarial object
structive for deep learning models because they can dir- In the work [47], authors report the first work which
ectly challenge many practical applications of DNN, such successfully crafted physical 3D adversarial objects. As
as face recognition, autonomous vehicle, etc. shown in Fig. 6, the authors use 3D-printing to manufac-
3.2.1 Exploring adversarial examples in physical ture an “adversarial” turtle. To achieve their goal, they
world implement a 3D rendering technique. Given a textured
In the work [15], the authors explore the feasibility of 3D object, they first optimize the object′s texture such
crafting physical adversarial objects, by checking wheth- that the rendering images are adversarial from any view-
er the generated adversarial images (FGSM, BIM) are point. In this process, they also ensure that the perturba-
“robust” under natural transformation (such as changing tion remains adversarial under different environments:
viewpoint, lighting, etc). Here, “robust” means the craf- camera distance, lighting conditions, rotation and back-
ted images remain adversarial after the transformation. ground. After finding the perturbation on 3D rendering,
To apply the transformation, they print out the crafted they print an instance of the 3D object.
images, and let test subjects use cellphones to take pho-
lead road sign recognizers. They achieve the attack by 3.3 Black-box attacks
putting stickers on the stop sign in the desired positions.
The author′s approach consist of: 1) Implement l1- 3.3.1 Substitute model
norm based attack (those attacks that constrain The work [48] was the first to introduce an effective
||x′ − x||1) on digital images of road signs to roughly find algorithm to attack DNN classifiers, under the condition
that the adversary has no access to the classifier′s para-
meters or training set (black-box). An adversary can only
feed input x to obtain the output label y from the classifi-
er. Additionally, the adversary may have only partial
knowledge about: 1) the classifier′s data domain (e.g.,
handwritten digits, photographs, human faces) and 2) the
architecture of the classifier (e.g., CNN, RNN).
The authors in the work [48] exploits the “transferab-
ility” (Section 5.3) property of adversarial examples: a
sample x′ can attack F1, it is also likely to attack F2 ,
which has similar structure to F1. Thus, the authors in-
troduce a method to train a substitute model F ′ to imit-
ate the target victim classifier F , and then craft the ad-
Fig. 5 Attacker puts some stickers on a road sign to confuse an
versarial example by attacking substitute model F ′ . The
autonomous vehicle′ s road sign recognizer from any viewpoint main steps are below:
(Image credit: Eykholt et al.[20])
1) Synthesize substitute training dataset
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 159
Make a “replica” training set. For example, to attack in practical applications. There are some studies on im-
a victim classifier for hand-written digits recognition task, proving the efficiency of generating black-box adversarial
make an initial substitute training set by: a) requiring examples via a limited number of queries. For example,
samples from test set; or b) handcrafting samples. the authors in work [51] introduced a more efficient way
2) Training the substitute model to estimate the gradient information from model outputs.
Feed the substitute training dataset X into the vic- They use natural evolutional strategies[52], which sample
tim classifier to obtain their labels Y . Choose one substi- the model′s output based on the queries around x, and es-
tute DNN model to train on (X, Y ) to get F ′ . Based on timate the expectation of gradient of F on x. This pro-
the attacker′s knowledge, the chosen DNN should have cedure requires fewer queries to the model. Moreover, the
similar structure to the victim model. authors in work [53] apply a genetic algorithm to search
3) Dataset augmentation the neighbors of benign image for adversarial examples.
Augment the dataset (X, Y ) and retrain the substi-
tute model F ′ iteratively. This procedure helps to in- 3.4 Semi-white (grey) box attack
crease the diversity of the replica training set and im-
prove the accuracy of substitute model F ′ . 3.4.1 Using generative adversarial network (GAN)
4) Attacking the substitute model to generate adversarial examples
Utilize the previously introduced attack methods, such The work [54] devised a semi-white box attack frame-
as FGSM to attack the model F ′ . The generated ad- work. It first trained a GAN[55], targeting the model of in-
versarial examples are also very likely to mislead the tar- terest. The attacker can then craft adversarial examples
get model F , by the property of “transferability”. directly from the generative network.
What kind of attack algorithm should we choose to The authors believe the advantage of the GAN-based
attack substitute model? The success of substitute model attack is that it accelerates the process of producing ad-
black-box attack is based on the “transferability” prop- versarial examples, and makes more natural and more un-
erty of adversarial examples. Thus, during black-box at- detectable samples. Later, Deb′s grey box attack[56] uses
tack, we choose attacks that have high transferability, GAN to generate adversarial faces to evade face recogni-
like FGSM, PGD and momentum-based iterative tion software. Their crafted face images appear to be
attacks[49]. more natural and have barely distinguishable difference
3.3.2 ZOO: Zeroth order optimization based black- from target face images.
box attack
Different from the work in Section 3.3.1 where an ad- 3.5 Poisoning attacks
versary can only obtain the label information from the
classifier, the work [50] assume the attacker has access to The attacks we have discussed so far are evasion at-
the prediction confidence (sscore) from the victim classifi- tacks, which are launched after the classification model is
er′s output. In this case, there is no need to build the trained. Some works instead craft adversarial examples
substitute training set and substitute model. Chen et al. before training. These adversarial examples are inserted
give an algorithm to “scrape” the gradient information into the training set in order to undermine the overall ac-
around victim sample x by observing the changes in the curacy of the learned classifier, or influence its prediction
prediction confidence F (x) as the pixel values of x are on certain test examples. This process is called a poison-
ing attack.
tuned.
Usually, the adversary in a poisoning attack setting
Equation (4) shows for each index i of sample x, we
add (or subtract) xi by h . If h is small enough, we can has knowledge about the architecture of the model which
scrape the gradient information from the output of F (·) is later trained on the poisoned dataset. Poisoning at-
tacks frequently applied to attack graph neural network,
by
because of the GNN′s specific transductive learning pro-
∂F (x) F (x + hei ) − F (x − hei ) cedure. Here, we introduce studies that craft image pois-
≈ . (4) oning attacks.
∂xi 2h
3.5.1 Biggio′s poisoning attack on SVM
Utilizing the approximate gradient, we can apply the The work [19] introduced a method to poison the
attack formulations introduced in Sections 3.1.3 and training set in order to reduce SVM model′s accuracy. In
3.1.7. The attack success rate of ZOO is higher than sub- their setting, they try to figure out a poison sample xc
stitute model (Section 3.3.1) because it can utilize the in- which, when inserted into the training set, will result in
formation of prediction confidence, instead of solely the the learned SVM model Fxc having a large total loss on
predicted labels. the whole validation set. They achieve this by using in-
3.3.3 Query-efficient black-box attack cremental learning technique for SVMs[57], which can
Previously introduced black-box attacks require lots of model the influence of training sample on the learned
input queries to the classifier, which may be prohibitive SVM model.
160 International Journal of Automation and Computing 17(2), April 2020
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 161
cessor to discretize an image′s pixel value xi into a l-di- 4.2 Robust optimization
mensional vector τ (xi ). (e.g., when l = 10 , τ (0.66) =
1 111 110 000 ). The vector τ (xi ) acts as a “thermometer” Robust optimization methods aim to improve the clas-
to record the pixel xi′s value. A DNN model is later sifier′s robustness (Section 2.2) by changing DNN model′s
trained on these vectors. Another work [62] studies a manner of learning. They study how to learn model para-
number of image processing tools, such as image crop- meters that can give promising predictions on potential
ping, compressing, total-variance minimization and super- adversarial examples. In this field, the works majorly fo-
resolution[63], to determine whether these techniques help cus on: 1) learning model parameters θ∗ to minimize the
to protect the model against adversarial examples. All average adversarial loss: (Section 2.2.2)
these approaches block up the smooth connection
between the model′s output and the original input θ∗ = arg min E max L(θ, x′ , y) (5)
θ∈Θ x∼D ||x′ −x||≤ϵ
samples, so the attacker cannot easily find the gradient
∂F (x)
for attacking. or 2) learning model parameters θ∗ to maximize the
∂x
4.1.3 Stochastic/Randomized gradients average minimal perturbation distance: (Section 2.2.1)
Some defense strategies try to randomize the DNN
model in order to confound the adversary. For instance, θ∗ = arg max E min ||x′ − x||. (6)
θ∈Θ x∼D C(x′ )̸=y
we train a set of classifiers s = {Ft : t = 1, 2, · · · , k} . Dur-
ing evaluation on data x, we randomly select one classifi- Typically, a robust optimization algorithm should
er from the set s and predict the label y . Because the ad- have a prior knowledge of its potential threat or poten-
versary has no idea which classifier is used by the predic- tial attack (adversarial space D ). Then, the defenders
tion model, the attack success rate will be reduced. build classifiers which are safe against this specific attack.
Some examples of this strategy include the work [64], For most of the related works[9, 14, 15], they aim to defend
who randomly drop some neurons of each layer of the against adversarial examples generated from small lp
DNN model, and the work [65], who resize the input im- (specifically l∞ and l2) norm perturbation. Even though
ages to a random size and pad zeros around the input im- there is a chance that these defenses are still vulnerable
age. to attacks from other mechanisms, (e.g., spatial
4.1.4 Exploding & vanishing gradients attack[44]), studying the security against lp attack is fun-
Both PixelDefend[66] and Defense-GAN[67] suggest us- damental and can be generalized to other attacks.
ing generative models to project a potential adversarial In this section, we concentrate on defense approaches
example onto the benign data manifold before classifying using robustness optimization against lp attacks. We cat-
them. While PixelDefend uses PixelCNN generative mod- egorize the related works into three groups: 1) regulariza-
el[68], Defense-GAN uses a GAN architecture[5]. The gen- tion methods, 2) adversarial (re)training and 3) certified
erative models can be viewed as a purifier that trans- defenses.
forms adversarial examples into benign examples. 4.2.1 Regularization methods
Both of these methods consider adding a generative Some early studies on defending against adversarial
network before the classifier DNN, which will cause the examples focus on exploiting certain properties that a ro-
final classification model be an extremely deep neural net- bust DNN should have in order to resist adversarial ex-
work. The underlying reason that these defenses succeed amples. For example, Szegedy et al.[8] suggest that a ro-
is because: The cumulative product of partial derivatives bust model should be stable when its inputs are distorted,
∂L (x) so they turn to constrain the Lipschitz constant to im-
from each layer will cause the gradient to be ex-
∂x pose this “stability” of model output. Training on these
tremely small or irregularly large, which prevents the at- regularizations can sometimes heuristically help the mod-
tacker accurately estimating the location of adversarial el be more robust.
examples. 1) Penalize layer's Lipschitz constant
4.1.5 Gradient masking/Obfuscation methods are
When Szegydy et al.[8] first claimed the vulnerability
not safe
of DNN models to adversarial examples, they suggested
In the work Carlini and Wagner′s attack[34], they show
adding regularization terms on the parameters during
the method of “Defensive Distillation” (Section 4.1.1) is
training, to force the trained model be stable. It sugges-
still vulnerable to their adversarial examples. In the study
ted constraining the Lipschitz constant Lk between any
[13], the authors devised different attacking algorithms to
two layers:
break gradient masking/obfuscation defending strategies
(Sections 4.1.2 – 4.1.4). ∀x, δ, ||hk (x; Wk ) − hk (x + δ; Wk )|| ≤ Lk ||δ||
The main weakness of the gradient masking strategy
is that: It can only “confound” the adversaries; it cannot so that the outcome of each layer will not be easily
eliminate the existence of adversarial examples. influenced by the small distortion of its input. The work
162 International Journal of Automation and Computing 17(2), April 2020
Parseval networks[69] formalized this idea, by claiming will cause gradient obfuscation (Section 4.1), where there
that the model′s adversarial risk (5) is right dependent on is an extreme non-smoothness of the trained classifier F
this instability Lk: near the test sample x. Refer to Fig. 7 as an illustration of
the non-smooth property of FGSM trained classifier.
E Ladv (x) ≤ E L(x)+ Algorithm 1. Adversarial training with FGSM by
x∼D x∼D
batches
E [ max |L(F (x′ ), y) − L(F (x), y)|] ≤
x∼D ||x′ −x||≤ϵ Randomly initialize network F
∏
K Repeat
E L(x) + λp Lk 1) Read minibatch B = {x1 , · · · , xm } from training set
x∼D
k=1
2) Generate k adversarial examples {x1adv , · · · , xkadv }
where λp is the Lipschitz constant of the loss function. for corresponding benign examples using current state of
This formula states that during the training process, the network F .
penalizing the large instability for each hidden layer can 3) Update B ′ = {x1adv , · · · , xkadv , xk+1 , · · · , xm }
help to decrease the adversarial risk of the model, and Do one training step of network F using minibatch B ′
consequently increase the robustness of model. The idea until training converged.
of constraining instability also appears in the study [70] 2) Adversarial training with PGD
for semi-supervised, and unsupervised defenses. The PGD adversarial training[14] suggests using projec-
2) Penalize layer′s partial derivative ted gradient descent attack (Section 3.1.6) for adversari-
The study [71] introduced a deep contractive network al training, instead of using single-step attacks like
algorithm to regularize the training. It was inspired by FGSM. The PGD attacks (Section 3.1.6) can be seen as a
the contractive autoencoder[72], which was introduced to heuristic method to find the “most adversarial” example:
denoise the encoded representation learning. The deep
contractive network suggests adding a penalty on the xadv = arg max L(x′ , F ) (7)
partial derivatives at each layer into the standard back- x′ ∈Bϵ (x)
In this way, we gain confidence and reduce the expected Recall that we introduced in Section 2.2.2, the func-
risk when building DNN models. tion maxi̸=y Zi (x′ ) − Zy (x′ ) is a type of loss function
The method of Reluplex seeks to find the exact value called margin loss.
of r(x; F ) that can verify the model F ′s robustness on x. The certificate U(x, F ) acts in this way: If
Alternately, works such as [77−79], try to find trainable U(x, F ) < 0, then adversarial loss L(x, F ) < 0. Thus, the
“certificates” C(x; F ) to verify the model robustness. For classifier always gives the largest score to the true label y
example, in the work [79], the authors calculate a certific- in the region Bϵ (x), and the model is safe in this region.
ate C(x, F ) for model F on x, which is a lower bound of To increase the model′s robustness, we should learn para-
minimal perturbation distance: C(x, F ) ≤ r(x, F ). As meters that have the smallest U value, so that more and
more data samples will have negative U values.
shown in Fig. 8, the model must be safe against any per-
The work proposed by Raghunathan et al.[77] uses in-
turbation with norm limited by C(x, F ). Moreover, these
tegration inequalities to derive the certificate and use
certificates are trainable. Training to optimize these certi-
semi-definite programming (SDP)[80] to solve the certific-
ficates will grant good robustness to the classifier. In this
ate. In contrast, the work of Wong and Kolter[78] trans-
section, we shall briefly introduce some methods to design
forms the problem (8) into a linear programming prob-
these certificates. lem and solves the problem via training an alternative
al examples from benign ones by the hidden layers. squeezing methods were broken by Sharma and Chen[91],
4.3.2 Using statistics to distinguish adversarial
which introduced a “stronger” adversarial attack.
examples
The authors in work [16] claim that the properties
Some early works heuristically study the differences in
which are intrinsic to adversarial examples are not very
the statistical properties of adversarial examples and be-
easy to find. They also gave several suggestions on future
nign examples. For example, in the study [87], the au- detection works:
thors found adversarial examples place a higher weight on 1) Randomization can increase the required attacking
the larger (later) principle components where the natural distortion.
images have larger weight on early principle components. 2) Defenses that directly manipulate on raw pixel val-
Thus, they can split them by principled component ana- ues are ineffective.
lysis (PCA). 3) Evaluation should be down on multiple datasets be-
In the work [84], the authors use a statistical test: sides MNIST.
maximum mean discrepancy (MMD) test[88], which is 4) Report false positive and true positive rates for de-
used to test whether two datsets are drawn from the tection.
same distribution. They use this testing tool to test 5) Evaluate using a strong attack. Simply focusing on
whether a group of data points are benign or adversarial. white-box attacks is risky.
4.3.3 Checking the prediction consistency
Other studies focus on checking the consistency of the 5 Explanations for the existence of
sample x′s prediction outcome. They usually manipulate
adversarial examples
the model parameters or the input examples themselves,
to check whether the outputs of the classifier have signi- In addition to crafting adversarial examples and de-
ficant changes. These are based on the belief that the fending them, explaining the reason behind these phe-
classifier will have stable predictions on natural examples nomena is also important. In this section, we briefly in-
under these manipulations. troduce the recent works and hypotheses on the key ques-
The work [89] randomizes the classifier using tions of adversarial learning. We hope our introduction
Dropout[90]. If these classifiers give very different predic- will give our audience a basic view on the existing ideas
tion outcomes on x after randomization, this sample x is and solutions for these questions.
very likely to be an adversarial one.
The work [17] manipulates the input sample itself to 5.1 Why do adversarial examples exist?
check the consistency. For each input sample x, the au-
thors reduce the color depth of the image (e.g., one 8-bit Some original works such as Szegedy′s L-BFGS
grayscale image with 256 possible values for each pixel attack[8], state that the existence of adversarial examples
becomes a 7-bit with 128 possible values), as shown in is due to the fact that DNN models do not generalize well
Fig. 9. The authors hypothesize that for natural images, in low probability space of data. The generalization issue
reducing the color depth will not change the prediction may be caused by the high complexity of DNN model
result, but the prediction on adversarial examples will structures.
change. In this way, they can detect adversarial ex- However, in the work [9], even linear models are also
amples. Similar to reducing the color depth, the work [89] vulnerable to adversarial attacks. Furthermore, in the
also introduced other feature squeezing methods, such as work [14], they implement experiments to show that an
spatial smoothing. increase in model capacity will improve the model robust-
4.3.4 Some attacks which evade adversarial ness.
detections Some insight can be gained about the existence of ad-
The study [16] bypassed 10 of the detection methods versarial examples by studying the model′s decision
which fall into the three categories above. The feature boundary. The adversarial examples are almost always
166 International Journal of Automation and Computing 17(2), April 2020
close to decision boundary of a natural trained model, There are some distinct difference between attacking
which may be because the decision boundary is too graph models and attacking traditional image classifiers:
flat[92], too curved[93], or inflexible[94]. 1) Non-independence. Samples of the graph-struc-
Studying the reason behind the existence of adversari- tured data are not independent: Changing one′s feature or
al examples is important because it can guide us in connection will influence the prediction on others.
designing more robust models, and help us to understand 2) Poisoning attacks. Graph neural networks are
existing deep learning models. However, there is still no usually performed in a transductive learning setting: The
consensus on this problem. test data are also used to train the classifier. This means
that we modify the test data, the trained classifier is also
5.2 Can we build an optimal classifier? changed.
3) Discreteness. When modifying the graph struc-
Many recent works hypothesize that it might be im- ture, the search space for adversarial example is discrete.
possible to build optimally robust classifier. For example, Previous gradient methods to find adversarial examples
the study [95] claim that adversarial examples are inevit- may be invalid in this case.
able because the distribution of data in each class is not Below are the methods used by some successful works
well-concentrated, which leaves room for adversarial ex- to attack and defend graph neural networks.
amples. In this vein, the work [96] claims that to im-
prove the robustness of a trained model, it is necessary to 6.1 Definitions for graphs and graph mod-
collect more data. Moreover, the authors in work [25] sug- els
gest, even if we can build models with high robustness, it
must take cost of some accuracy. In this section, the notations and definitions of the
graph structured data and graph neural network models
5.3 What is transferability? are defined below. A graph can be represented as
G = {V, E}, where V is a set of N nodes and E is a set of
Transferability is one of the key properties of ad- M edges. The edges describe the connections between the
versarial examples. It means that the adversarial ex- nodes, which can also be expressed by an adjacency mat-
amples generated to target one victim model also have a rix A ∈ {0, 1}N ×N. Furthermore, a graph G is called an
high probability of misleading other models. attributed graph if each node in V is associated with a d-
Some works compare the transferability between dif- dimensional attribute vector xv ∈ Rd . The attributes for
ferent attacking algorithms. In the work [31], the authors
all the nodes in the graph can be summarized as a mat-
claim that in ImageNet, single step attacks (FGSM) are
rix X ∈ RN ×d , the i-th row of which represents the at-
more likely to transfer between models than iterative at-
tribute vector for node vi.
tacks (BIM) under same perturbation intensity.
The goal of node classification is to learn a function
The property of transferability is frequently utilized in
g : V → Y that maps each node to one class in Y , based
attacking techniques in black-box setting[48]. If the model
on a group of labeled nodes in G. One of the most suc-
parameters are veiled to attackers, they can turn to at-
cessful node classification models is graph convolutional
tack other substitute models and enjoy the transferabil-
network (GCN)[7]. The GCN model keeps aggregating the
ity of their generated samples. The property of transfer-
information from neighboring nodes to learn representa-
ability is also utilized by defending methods as in the
tions for each node v ,
work [87]: Since the adversarial examples for model A are
also likely to be adversarial for model B, adversarial
H (0) = X; H (l+1) = σ(ÂH (l) W l )
training using adversarial examples from B will help de-
fend A.
where σ is a non-linear activation function, the matrix Â
1 1
is defined as  = D̃− 2 ÃD̃− 2 , à = A + IN , and
6 Graph adversarial examples ∑
D̃ii = j Ãij . The last layer outputs the score vectors of
(m)
Adversarial examples also exist in graph-structured each node for prediction: Hv = F (v, X).
data[10, 97]. Attackers usually slightly modify the graph
structure and node features, in an effort to cause the 6.2 Zugner′s greedy method
graph neural networks (GNN) to give wrong prediction
for node classification or graph classification tasks. These In the work of Zugner et al.[10], they consider attack-
adversarial attacks therefore raise concerns on the secur- ing node classification models, graph convolutional net-
ity of applying GNN models. For example, a bank needs works[7], by modifying the nodes connections or node fea-
to build a reliable credit evaluation system where their tures (binary). In this setting, an adversary is allowed to
model should not be easily attacked by malicious manipu- add/remove edges between nodes, or flip the feature of
lations. nodes with limited number of operations. The goal is to
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 167
mislead the GCN model which is trained on the per- siders adding or removing edges to modify the graph
turbed graph (transductive learning) to give wrong pre- structure.
dictions. In their work, they also specify three levels of In the work′s setting of [97], a node classifier F
adversary capabilities: they can manipulate 1) all nodes, trained on the clean graph G(0) = G is given, node classi-
2) a set of nodes A including the target victim x, and 3) fier F is unknown to the attacker, and the attacker is al-
a set of nodes A which does not include target node x. A lowed to modify m edges in total to alter F ′s prediction
sketch is shown in Fig. 10. on the victim node v0. The authors formulate this attack-
ing mission as a Q-Learning game[99], with the defined
[] [] Target node [] []
··
··
··
··
××
··
··
··
{
Similar to the objective function in Carlini and Wagn- 1, if C(v0 , G(m) ) ̸= y
r(sm , am ) =
−1, if C(v0 , G(m) ) = y.
er[34] for image data, they formulate the graph attacking
problem as a search for a perturbed graph G′ such that
4) Termination. The process stops once the agent
the learned GCN classifier Z ∗ has the largest score mar-
finishes modifying m edges.
gin:
The Q-learning algorithm helps the adversary have
knowledge about which actions to take (add/remove
max ln(Zy∗ (v0 , G′ )) − ln(Zi∗ (v0 , G′ )). (9)
i̸=y which edge) on the given state (current graph structure),
in order to get largest reward (change F ′s output).
The authors solve this objective by finding perturba-
tions on a fixed, linearized substitute GCN classifier Gsub 6.4 Graph structure poisoning via meta-
which is trained on the clean graph. They use a heuristic learning
algorithm to find the most influential operations on graph
Gsub (e.g., removing/adding the edge or flipping the fea- Previous graph attack works only focus on attacking
ture which can cause largest increase in (9)). The experi- one single victim node. Meta learning attack[100] attempt
mental results demonstrate the adversarial operations are to poison the graph so that the global node classification
also effective on the later trained classifier Z ∗ . performance of GCN can be undermined and made al-
During the attacking process, the authors also impose most useless. Their approach is based on meta
two key constraints to ensure the similarity of the per- learning[101], which is traditionally used for hyperparamet-
turbed graph to the original one: 1) the degree distribu- er optimization, few-shot image recognition, and fast rein-
tion should be maintained, and 2) two positive features forcement learning. In the work [100], they use meta
which never happen together in G should also not hap- learning technique which takes the graph structure as the
pen together in G′ . Later, some other graph attacking hyperparameter of the GCN model to optimize. Using
works (e.g., [98]) suggest the eigenvalues/eigenvectors of their algorithm to perturb 5% edges of a CITESEER
the graph Laplacian matrix should also be maintained graph dataset, they can increase the misclassification rate
during attacking, otherwise the attacks are easily detec- to over 30%.
ted. However, there is still no firm consensus on how to
formally define the similarity between graphs and gener- 6.5 Attack on node embedding
ate unnoticeable perturbation.
Node embedding attack[102] studies how to perturb the
6.3 Dai′s RL method: RL-S2V graph structure in order to corrupt the quality of node
embedding, and consequently hinder subsequent learning
Different from Zugner′s greedy method, the work of tasks such as node classification or link prediction. Spe-
Dai et al.[97], introduced a reinforcement learning method cifically, they study DeepWalk[103] as a random-walk
to attack the graph neural networks. This work only con- based node embedding learning approach and approxim-
168 International Journal of Automation and Computing 17(2), April 2020
ately find the graph which has the largest loss of the DeepSpeech[107]. In their setting, when given any speech
learned node embedding. waveform x, they can add an inaudible sound perturba-
tion δ that makes the synthesized speech x + δ be recog-
6.6 ReWatt: Attacking graph classifier via nized as any targeted desired phrase.
rewiring In their attacking work, they limited the maximum
decibels (dB) on any time of the added perturbation
The ReWatt method[98] attempts to attack the graph noise, so that the audio distortion is unnoticeable.
classification models, where each input of the model is a Moreover, they inherit the C & W′s attack method[34] on
whole graph. The proposed algorithm can mislead the their audio attack setting.
model by making unnoticeable perturbations on graph.
In their attacking scheme, they utilize reinforcement 7.2 Text classification attacks
learning to find a rewiring operation a = (v1 , v2 , v3 ) at
each step, which is a set of 3 nodes. The first two nodes Text classification is one of main tasks in natural lan-
were connected in the original graph and the edge guage processing. In text classification, the model is de-
between them is removed in the first step of the rewiring vised to understand a sentence and correctly label the
process. The second step of the rewiring process adds an sentence. For example, text classification models can be
edge between the nodes v1 and v3, where v3 is con- applied on IMDB dataset for characterizing user′s opin-
strained to be within 2 -hops away from v1. Some ion (positive or negative) on the movies, based on their
analysis[98] show that the rewiring operation tends to keep provided reviews. Recent works of adversarial attacks
the eigenvalues of the graph′s Laplacian matrix, which have demonstrated that text classifiers are easily mis-
makes it difficult to detect the attacker. guided by adversaries slightly modifying the texts'
spelling, words or structure.
6.7 Defending graph neural networks 7.2.1 Attack word embedding
The work [108] considers to add perturbation on the
Many works have shown that graph neural networks word embedding[109], so as to fool a LSTM[4] classifier.
are vulnerable to adversarial examples, even though there However, this attack only considers perturbing the word
is still no consensus on how to define the unnoticeable embedding, instead of original input sentence itself.
perturbation. Some defending works have already ap- 7.2.2 Manipulate words, letters
peared. Many of them are inspired by the popular de- The work HotFlip[11] considers to replace a letter in a
fense methodology in image classification, using adversari- sentence in order to mislead a character-level text classifi-
al training to protect GNN models[104, 105], which provides er (each letter is encoded to a vector). For example, as
moderate robustness. shown in Fig. 11, altering a single letter in a sentence al-
ters the model′s prediction on its topic. The attack al-
7 Adversarial examples in audio and gorithm manages to achieve this by finding the most-in-
fluential letter replacement via gradient information.
text data
These adversarial perturbations can be noticed by hu-
Adversarial examples also exist in DNN′s applications man readers, but they don't change the content of the
in audio and text domains. An adversary can craft fake text as a whole, nor do they affect human judgments.
speech or fake sentences that mislead the machine lan-
guage processors. Meanwhile, deep learning models on au- South Africa’s historic Soweto township marks its
dio/text data have already been widely used in many 100th birthday on Tuesday in a mood of optimism.
57% World
tasks, such as Apple Siri and Amazon Echo. Therefore, South Africa’s historic Soweto township marks its
the studies on adversarial examples in audio/text data 100th birthday on Tuesday in a mooP of optimism.
domain also deserve our attention. 95% Sci/Tech
As for text data, the discreteness nature of the inputs
Fig. 11 Replace one letter in a sentence to alter a text
makes the gradient-based attack on images not applic- classifier′ s prediction on a sentence′ s topic (Image credit:
able anymore and forces people to craft discrete perturba- Ebrahimi et al.[11])
latter follows the same idea, and improves it by introdu- the inserted sentence (blue) looks similar to the question,
cing new scoring functions. but does not contradict the correct answer. This inserted
The works of Samanta and Mehta[113], Iyyer et al.[114] sentence is understandable for human reader but con-
start to craft adversarial sentences that grammarly cor- fuses the machine a lot. As a result, the proposed attack-
rect and maintain the syntax structure of the original ing algorithm reduced the performance of 16 state-of-art
sentence. Samanta and Mehta[113] achieve this by using reading comprehension models from average 75% F1 score
synonyms to replace original words, or adding some (accuracy) to 36%.
words which have different meanings in different context. Their proposed algorithm AddSent shows a four-step
On the other hand, Iyyer et al.[114] manage to fool the operation to find adversarial sentence.
text classifier by paraphrasing the structure of sentences. 1) Fake question: What is the name of the quarter-
Witbrock[115] conducts sentence and word paraphras- back whose jersey number is 37 in Champ Bowl XXXIV?
ing on input texts to craft adversarial examples. In this 2) Fake answer: Jeff Dean.
work, they first build a paraphrasing corpus that con- 3) Question to declarative form: Quarterback Jeff
tains a lot of word and sentence paraphrases. To find an Dean is jersey number 37 in Champ Bowl XXXIV.
optimal paraphrase of an input text, a greedy method is 4) Get grammarly correct: Quarterback Jeff Dean had
adopted to search valid paraphrases for each word or sen- jersey number 37 in Champ Bowl XXXIV.
tence from the corpus. Moreover, they propose a gradi- 7.3.2 Attack on neural machine translation
ent-guided method to improve the efficiency of greedy The work [117] studies the stability of machine learn-
search. This work also has significant contributions in ing translation tools when their input sentences are per-
theory: They formally define the task of discrete ad- turbed from natural errors (typos, misspellings, etc) and
versarial attack as an optimization problem on a set func- manually crafted distortions (letter replacement, letter re-
tion and they prove that the greedy algorithm ensures a order). The experimental results show that the state-of-
1
1 − approximation factor for CNN and RNN text clas- arts translation models are vulnerable to both two types
e
sifiers. of errors, and suggest adversarial training to improve the
model′s robustness.
7.3 Adversarial examples in other NLP Seq2Sick[118] tries to attack seq2seq models in neural
tasks machine translation and text summarization. In their set-
ting, two goals of attacking are set: to mislead the model
7.3.1 Attack on reading comprehension systems to generate an output which has on overlapping with the
In the work [116], the authors study whether Reading ground truth, and to lead the model to produce an out-
Comprehension models are vulnerable to adversarial at- put with targeted keywords. The model is treated as a
tacks. In reading comprehension tasks, the machine learn- while-box and the authors formulate the attacking prob-
ing model is asked to answer a given question, based on lem as an optimization problem where they seek to solve
the model′s “understanding” from a paragraph of an art- a discrete perturbation by minimizing a hinge-like loss
icle. For example, the work [116] concentrates on Stan- function.
ford Question Answering Dataset (SQuAD), in which sys-
tems answer questions about paragraphs from Wikipedia. 7.4 Dialogue generation
The authors successfully degrade the intelligence of
the state-of-art reading comprehension models on SQuAD Unlike the tasks above where success and failure are
by inserting adversarial sentences. As shown in Fig. 12, clearly defined, in the task of dialogue, there is no unique
170 International Journal of Automation and Computing 17(2), April 2020
appropriate response for a given context. Thus, instead of Beyond digital-level adversarial faces, they also suc-
misleading a well-trained model to produce incorrect out- ceed in misleading face recognition models on physical
puts, works about attacking dialogue models seek to ex- level. They achieve this by asking subjects to wear their
plore the property of neural dialogue models to be in- 3D printed sunglasses frames. The authors optimize the
terfered by the perturbations on the inputs, or lead a color of these glasses by attacking the model on a digital
model to output targeted responses. level: by considering various adversarial glasses, the most
In the study [119], the authors explore the over-sensit- effective adversarial glasses are used for attack. As shown
ivity and over-stability of neural dialogue models by us- in Fig. 13, an adversary wears the adversarial glasses and
ing some heuristic techniques to modify original inputs successfully fool the detection of victim face recognition
and observe the corresponding outputs. They evaluate system.
the robustness of dialogue models by checking whether
172 International Journal of Automation and Computing 17(2), April 2020
Appendix
A. Dichotomy of attacks
Table A Dichotomy of attacks
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 173
B. Dichotomy of defenses
Defenses
Regularization [17]
[69]
[71]
[72]
Fig. B Dichotomy of defenses
[3] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. R. Mohamed,
four research groups. IEEE Signal Processing Magazine, han, I. Goodfellow, R. Fergus. Intriguing properties of
vol. 29, no. 6, pp. 82–97, 2012. DOI: 10.1109/MSP.2012. neural networks. ArXiv: 1312.6199, 2013.
2205597. [9] I. J. Goodfellow, J. Shlens, C. Szegedy. Explaining and
[4] S. Hochreiter, J. Schmidhuber. Long short-term memory.
harnessing adversarial examples. ArXiv: 1412.6572, 2014.
Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[10] D. Zügner, A. Akbarnejad, S. Günnemann. Adversarial
DOI: 10.1162/neco.1997.9.8.1735.
attacks on neural networks for graph data. In Proceed-
[5] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. ings of the 24th ACM SIGKDD International Conference
174 International Journal of Automation and Computing 17(2), April 2020
robustness and generalization. In Proceedings of the 32nd
tillation as a defense to adversarial perturbations against
IEEE Conference on Computer Vision and Pattern Re-
deep neural networks. In Proceedings of IEEE Symposi-
cognition, IEEE, Piscataway, USA, pp. 6976–6987, 2019.
um on Security and Privacy, IEEE, San Jose, USA,
pp. 582–597, 2016. DOI: 10.1109/SP.2016.41. [28] H. Y. Zhang, Y. D. Yu, J. T. Jiao, E. P. Xing, L. El
agenet: A large-scale hierarchical image database. In Pro-
Towards deep learning models resistant to adversarial at-
ceedings of IEEE Conference on Computer Vision and
tacks. ArXiv: 1706.06083, 2017.
Pattern Recognition, IEEE, Miami, USA, pp. 248–255,
[15] A. Kurakin, I. Goodfellow, S. Bengio. Adversarial ex-
2009. DOI: 10.1109/CVPR.2009.5206848.
amples in the physical world. ArXiv: 1607.02533, 2016.
[30] D. C. Liu, J. Nocedal. On the limited memory BFGS
[16] N. Carlini, D. Wagner. Adversarial examples are not eas-
method for large scale optimization. Mathematical Pro-
ily detected: Bypassing ten detection methods. In Pro- gramming, vol. 45, no. 1–3, pp. 503–528, 1989. DOI:
ceedings of the 10th ACM Workshop on Artificial Intelli- 10.1007/BF01589116.
gence and Security, ACM, Dallas, USA, pp. 3–14, 2017.
[31] A. Kurakin, I. Goodfellow, S. Bengio. Adversarial ma-
DOI: 10.1145/3128572.3140444.
chine learning at scale. ArXiv: 1611.01236, 2016.
[17] W. L. Xu, D. Evans, Y. J. Qi. Feature squeezing: Detect-
[32] S. M. Moosavi-Dezfooli, A. Fawzi, P. Frossard. DeepFool:
ing adversarial examples in deep neural networks. ArXiv:
C. W. Xiao, A. Prakash, T. Kohno, D. Song. Robust ness of neural networks. In Proceedings of IEEE Symposi-
physical-world attacks on deep learning models. ArXiv: um on Security and Privacy, IEEE, San Jose, USA,
1707.08945, 2017. pp. 39–57, 2017. DOI: 10.1109/SP.2017.49.
[21] F. Tramer, A. Kurakin, N. Papernot, I. Goodfellow, D.
[35] N. Carlini, G. Katz, C. Barrett, D. L Dill. Provably min-
Laskov, G. Giacinto, F. Roli. Evasion attacks against ma- Kochenderfer. Reluplex: An efficient SMT solver for veri-
chine learning at test time. In Proceedings of European fying deep neural networks. In Proceedings of the 29th In-
Conference on Machine Learning and Knowledge Discov- ternational Conference on Computer Aided Verification,
ery in Databases, Springer, Prague, Czech Republic, Springer, Heidelberg, Germany, pp. 97–117, 2017. DOI:
pp. 387–402, 2013. DOI: 10.1007/978-3-642-40994-3_25. 10.1007/978-3-319-63387-9_5.
[23] M. Barreno, B. Nelson, A. D. Joseph, J. D. Tygar. The se-
[37] V. Tjeng, K. Xiao, R. Tedrake. Evaluating robustness of
[24] N. Dalvi, P. Domingos, Mausam, S. Sanghai, D. Verma.
ing for faster adversarial robustness verification via indu-
Adversarial classification. In Proceedings of the 10th cing ReLU stability. ArXiv: 1809.03008, 2018.
ACM SIGKDD International Conference on Knowledge
[39] J. W. Su, D. V. Vargas, K. Sakurai. One pixel attack for
Discovery and Data Mining, ACM, Seattle, USA,
EAD: Elastic-net attacks to deep neural networks via ad-
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 175
versarial examples. In Proceedings of the 32nd AAAI [55] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D.
face synthesis. ArXiv: 1908.05008, 2019.
cognition, IEEE, Honolulu, USA, pp. 86–94, 2017. DOI: mental support vector machine learning. In Proceedings
10.1109/CVPR.2017.17. of the 13th International Conference on Neural Informa-
tion Processing Systems, MIT Press, Denver, USA,
[43] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh,
pp. 388–394, 2000.
S. A. Ma, Z. H. Huang, A. Karpathy, A. Khosla, M. Bern-
stein, A. C. Berg, F. F. Li. ImageNet large scale visual re- [58] P. W. Koh, P. Liang. Understanding black-box predic-
cognition challenge. International Journal of Computer tions via influence functions. In Proceedings of the 34th
Vision, vol. 115, no. 3, pp. 211–252, 2015. DOI: International Conference on Machine Learning, Sydney,
10.1007/s11263-015-0816-y. Australia, pp. 1885–1894, 2017.
[44] C. W. Xiao, J. Y. Zhu, B. Li, W. He, M. Y. Liu, D. Song.
[59] A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer,
[46] A. Odena, C. Olah, J. Shlens. Conditional image synthes-
[62] C. Guo, M. Rana, M. Cisse, L. van der Maaten. Counter-
robust adversarial examples. ArXiv: 1707.07397, 2017. ing adversarial images using input transformations. ArX-
[48] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B.
iv: 1711.00117, 2017.
Celik, A. Swami. Practical black-box attacks against ma- [63] V. K. Ha, J. C. Ren, X. Y. Xu, S. Zhao, G. Xie, V. M.
chine learning. In Proceedings of ACM on Asia Confer- Vargas. Deep learning based single image super-resolu-
ence on Computer and Communications Security, ACM, tion: A survey. In Proceedings of the 9th International
Abu Dhabi, United Arab Emirates, pp. 506–519, 2017. Conference on Brain Inspired Cognitive Systems, Spring-
DOI: 10.1145/3052973.3053009. er, Xi′an, China, pp. 106–119, 2018. DOI: 10.1007/978-3-
[49] Y. P. Dong, F. Z. Liao, T. Y. Pang, H. Su, J. Zhu, X. L.
030-00563-4_11.
Hu, J. G. Li. Boosting adversarial attacks with mo- [64] G. S. Dhillon, K. Azizzadenesheli, Z. C. Lipton, J. Bern-
[50] P. Y. Chen, H. Zhang, Y. Sharma, J. F Yi, C. J. Hsieh.
cial Intelligence and Security, ACM, Dallas, USA, Pixeldefend: Leveraging generative models to under-
pp. 15–26, 2017. DOI: 10.1145/3128572.3140448. stand and defend against adversarial examples. ArXiv:
1710.10766, 2017.
[51] A. Ilyas, L. Engstrom, A. Athalye, J. Lin. Black-box ad-
Peters, J. Schmidhuber. Natural evolution strategies. [68] A. van den Oord, N. Kalchbrenner, O. Vinyals, L. Espe-
Genattack: Practical black-box attacks with gradient-free
pp. 4790–4798, 2016.
optimization. ArXiv: 1805.11090, 2018.
[69] M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, N.
[54] C. W. Xiao, B. Li, J. Y. Zhu, W. He, M. Y. Liu, D. Song.
Usunier. Parseval networks: Improving robustness to ad-
tralia, pp. 854–863, 2017. adversarial images. ArXiv: 1608.00530, 2016.
[70] T. Miyato, S. I. Maeda, M. Koyama, K. Nakae, S. Ishii.
[88] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf,
increasing adversary strength. ArXiv: 1803.09868, 2018.
deep network training by reducing internal covariate
shift. ArXiv: 1502.03167, 2015. [92] A. Fawzi, S. M. Moosavi-Dezfooli, P. Frossard. Robust-
[93] S. M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, P. Frossard,
You only propagate once: Accelerating adversarial train- S. Soatto. Analysis of universal adversarial perturbations.
ing via maximal principle. ArXiv: 1905.00877, 2019. ArXiv: 1705.09554, 2017.
[76] L. S. Pontryagin. Mathematical Theory of Optimal Pro-
[94] A. Fawzi, O. Fawzi, P. Frossard. Analysis of classifiers′
[78] E. Wong, J. Z. Kolter. Provable defenses against ad-
stein. Are adversarial examples inevitable? ArXiv:
versarial examples via the convex outer adversarial poly- 1809.02104, 2018.
tope. ArXiv: 1711.00851, 2017. [96] L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, A.
[81] A. Raghunathan, J. Steinhardt, P. S. Liang. Semidefinite
tacking graph convolutional networks via rewiring. ArX-
relaxations for certifying robustness to adversarial ex- iv: 1906.03750, 2019.
amples. In Proceedings of the 32nd Conference on Neural [99] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Ant-
Information Processing Systems, Montréal, Canada, onoglou, D. Wierstra, M. Riedmiller. Playing Atari with
pp. 10877–10887, 2018. deep reinforcement learning. ArXiv: 1312.5602, 2013.
[82] E. Wong, F. Schmidt, J. H. Metzen, J. Z. Kolter. Scaling
[100] D. Züugner, S. Günnemann. Adversarial attacks on graph
provable adversarial defenses. In Proceedings of the 32nd neural networks via meta learning. ArXiv: 1902.08412,
Conference on Neural Information Processing Systems, 2019.
Montréal, Canada, pp. 8400–8409, 2018.
[101] C. Finn, P. Abbeel, S. Levine. Model-agnostic meta-
[83]
A. Sinha, H. Namkoong, J. Duchi. Certifying some distri-
learning for fast adaptation of deep networks. In Proceed-
[86] J. H. Metzen, T. Genewein, V. Fischer, B. Bischoff. On
learning of social representations. In Proceedings of the
detecting adversarial perturbations. ArXiv: 1702.04267, 20th ACM SIGKDD International Conference on Know-
2017. ledge Discovery and Data Mining, ACM, New York,
USA, pp. 701–710, 2014. DOI: 10.1145/2623330.2623732.
[87] D. Hendrycks, K. Gimpel. Early methods for detecting
H. Xu et al. / Adversarial Attacks and Defenses in Images, Graphs and Text: A Review 177
[104] F. L. Feng, X. N. He, J. Tang, T. S. Chua. Graph ad-
[121] H. C. Liu, T. Derr, Z. T. Liu, J. L Tang. Say what I want:
M. Y. Hong, X. Lin. Topology attack and defense for cessorize to a crime: Real and stealthy attacks on state-of-
graph neural networks: An optimization perspective. the-art face recognition. In Proceedings of the ACM SIG-
ArXiv: 1906.04214, 2019. SAC Conference on Computer and Communications Se-
curity, ACM, Vienna, Austria, pp. 1528–1540, 2016. DOI:
[106] N. Carlini, D. Wagner. Audio adversarial examples: Tar-
10.1145/2976749.2978392.
USA, pp. 1–7, 2018. DOI: 10.1109/SPW.2018.00009. nition. Machine Learning 2015.
[107] A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos,
[124] C. H. Xie, J. Y. Wang, Z. S. Zhang, Y. Y. Zhou, L. X. Xie,
E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. A. Yuille. Adversarial examples for semantic segmenta-
Coates, A. Y. Ng. Deep speech: Scaling up end-to-end tion and object detection. In Proceedings of IEEE Inter-
speech recognition. ArXiv: 1412.5567, 2014. national Conference on Computer Vision, IEEE, Venice,
Italy, pp. 1378–1387, 2017. DOI: 10.1109/ICCV.2017.
[108] T. Miyato, A. M. Dai, I. Goodfellow. Adversarial train-
153.
ing methods for semi-supervised text classification. ArX-
iv: 1605.07725, 2016. [125] J. H. Metzen, M. C. Kumar, T. Brox, V. Fischer. Univer-
sal adversarial perturbations against semantic image seg-
[109] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J.
mentation. In Proceedings of IEEE International Confer-
Deep text classification can be fooled. ArXiv: 1704.08006,
2017. [127] J. Kos, I. Fischer, D. Song. Adversarial examples for gen-
pp. 50–56, 2018. DOI: 10.1109/SPW.2018.00016. Bayes. ArXiv: 1312.6114, 2013.
[112] J. F. Li, S. L. Ji, T. Y. Du, B. Li, T. Wang. Textbugger:
[129] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, O. Win-
samples. ArXiv: 1707.02812, 2017. McDaniel. Adversarial perturbations against deep neural
networks for malware classification. ArXiv: 1606.04435,
[114] M. Iyyer, J. Wieting, K. Gimpel, L. Zettlemoyer. Ad-
2016.
tection: Generalization and efficiency. ArXiv: 1812.11574,
1803.01128, 2018.
2018.
[119] T. Niu, M. Bansal. Adversarial over-sensitivity and over-
[135] T. Chugh, K. Cao, A. K. Jain. Fingerprint spoof buster:
sion by Dr. Dinesh Rajen.
P. Lillicrap, D. Silver, K. Kavukcuoglu. Asynchronous Her research interests include signal
methods for deep reinforcement learning. In Proceedings processing, wireless communication, and
of the 33rd International conference on Machine Learn- deep learning related topics.
ing, PMLR, New York, USA, pp. 1928–1937, 2016. E-mail: [email protected]
Han Xu is a second year Ph. D. student of Ji-Liang Tang is an assistant professor in
computer science in DSE Lab, Michigan the computer science and engineering de-
State University, USA. He is under super- partment at Michigan State University
vision by Dr. Ji-Liang Tang. since Fall 2016. Before that, he was a re-
His research interests include deep search scientist in Yahoo Research and re-
learning safety and robustness, especially ceived his Ph. D. degree from Arizona
studying the problems related to ad- State University in 2015. He was the recip-
versarial examples. ients of 2019 NSF Career Award, the 2015
E-mail: [email protected] (Correspond- KDD Best Dissertation runner up and 6
ing author) Best Paper Awards (or runner-ups) including WSDM 2018 and
ORCID iD: 0000-0002-4016-6748
KDD 2016. He serves as conference organizers (e.g., KDD,
WSDM and SDM) and journal editors (e.g., TKDD). He has
Yao Ma received the B. Sc. degree in ap- published his research in highly ranked journals and top confer-
plied mathematics at Zhejiang University, ence proceedings, which received thousands of citations and ex-
China in 2015, the M. Sc. degree in statist- tensive media coverage.
ics, probabilities and operation research at His research interests include social computing, data mining
Eindhoven University of Technology, the and machine learning and their applications in education.
Netherlands in 2016. He is now a Ph. D. de- E-mail: [email protected]
gree candidate of Department of Com-
puter Science and Engineering, Michigan
State University, USA. His Ph. D. advisor Anil K. Jain (Ph. D., 1973, Ohio State
University; B. Tech., IIT Kanpur) is a Uni-
is Dr. Jiliang Tang.
versity Distinguished Professor at
His research interests include graph neural networks and their
Michigan State University where he con-
related safety issues.
ducts research in pattern recognition, ma-
E-mail: [email protected]
chine learning, computer vision, and bio-
metrics recognition. He was a member of
Hao-Chen Liu is currently a Ph. D. stu- the United States Defense Science Board
dent at the Department of Computer Sci- and Forensics Science Standards Board.
ence and Engineering at Michigan State His prizes include Guggenheim, Humboldt, Fulbright, and King-
University, under the supervision of Dr. Sun Fu Prize. For advancing pattern recognition, Jain was awar-
Jiliang Tang. He is a member of Data Sci- ded Doctor Honoris Causa by Universidad Autónoma de Mad-
ence and Engineering (DSE) Lab. rid. He was Editor-in-Chief of the IEEE Transactions on Pat-
His research interests include natural tern Analysis and Machine Intelligence and is a Fellow of ACM,
language processing problems, especially in IEEE, AAAS, and SPIE. Jain has been assigned 8 U.S. and
the robustness, fairness of dialogue sys- Korean patents and is active in technology transfer for which he
tems. was elected to the National Academy of Inventors. Jain is a
E-mail: [email protected] member of the U.S. National Academy of Engineering (NAE),
foreign member of the Indian National Academy of Engineering
(INAE), a member of The World Academy of Science (TWAS)
Debayan Deb is a Ph. D. degree candid-
ate in the Biometrics Lab, Michigan State and a foreign member of the Chinese Academy of Sciences
University, USA under the supervision of (CAS).
Dr. Anil K. Jain. Before joining the Bio- His research interests include pattern recognition, machine
metrics Lab of MSU, he graduated from learning, computer vision, and biometrics recognition.
Michigan State University with a Bachel- E-mail: [email protected]
or Degree of Computer Science and Engin-
eering.
His research interests include face recog-
nition and computer vision tasks.