Machine Learning Security and Privacy A Review of
Machine Learning Security and Privacy A Review of
Machine Learning Security and Privacy A Review of
EURASIP Journal on
EURASIP Journal on Information Security (2024) 2024:10
https://doi.org/10.1186/s13635-024-00158-3 Information Security
Abstract
Machine learning has become prevalent in transforming diverse aspects of our daily lives through intelligent digital
solutions. Advanced disease diagnosis, autonomous vehicular systems, and automated threat detection and tri-
age are some prominent use cases. Furthermore, the increasing use of machine learning in critical national infra-
structures such as smart grids, transport, and natural resources makes it an attractive target for adversaries. The
threat to machine learning systems is aggravated due to the ability of mal-actors to reverse engineer publicly avail-
able models, gaining insight into the algorithms underpinning these models. Focusing on the threat landscape
for machine learning systems, we have conducted an in-depth analysis to critically examine the security and privacy
threats to machine learning and the factors involved in developing these adversarial attacks. Our analysis highlighted
that feature engineering, model architecture, and targeted system knowledge are crucial aspects in formulating these
attacks. Furthermore, one successful attack can lead to other attacks; for instance, poisoning attacks can lead to mem-
bership inference and backdoor attacks. We have also reviewed the literature concerning methods and techniques
to mitigate these threats whilst identifying their limitations including data sanitization, adversarial training, and dif-
ferential privacy. Cleaning and sanitizing datasets may lead to other challenges, including underfitting and affecting
model performance, whereas differential privacy does not completely preserve model’s privacy. Leveraging the analy-
sis of attack surfaces and mitigation techniques, we identify potential research directions to improve the trustworthi-
ness of machine learning systems.
Keywords Adversarial attacks, Scrutiny-by-design, Poisoned dataset, Exploiting integrity, Data sanitization, Differential
privacy
© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http://creativecommons.org/licenses/by/4.0/.
functional capabilities of the model. Bhagoji, A.N. [20] • What are the open challenges and vulnerabilities
exploited the privacy of the content moderation classi- identified in ML models, existing mitigation solu-
fier hosted by Clarifai. tions, and open attack surfaces in machine learning?
Recent successful attacks on real-time machine learn-
ing systems prove the practicality of adversarial ML Correlation between our problem statement, research
attacks. Zou A. et al. [21] attacked ChatGPT, Claude, questions, and interlinked addressed sections are given
and Bard with inference accuracy of 50% on GPT-4 and in Fig. 1. Addressing the above-given research questions,
86.6% on GPT-3.5. Also, Gong X. et al. [22] attacked the major contributions that this study makes are listed
commercial Alibaba API with a 97% success rate. These as follows:
attacks highlight the urge for comprehensive research
to make ML models resilient, specifically focusing on • To the best of our knowledge, this is the pioneering
security-by-design solutions that should focus on the study to conduct an in-depth and critical analysis of
security and resilience of the development process threats to the machine learning systems by analyzing
rather than particular models. adversarial machine learning attacks, their severity,
In this literature review, we have performed a com- impact, and existing mitigation strategies and their
parative analysis of various adversarial attack types that limitations.
threaten machine learning model development: poison- • Unique analysis criterion is developed to exam-
ing, evasion, model inversion, and membership infer- ine existing research to determine the efficacy of
ence attacks. Comprehensively analyzing the severity, adversarial attack types that can be implemented on
impact, and limitations of each attack type provides machine learning. Our criterion analyzes various
valuable insights that help shape potential research attack vectors based on adversary’s capability and
directions and address the research questions given accessibility, victim model, and technical examina-
below: tion of threat on machine learning model.
• Four major adversarial ML attack types are studied,
• What are the significant ML adversarial attack types based on the modeling process of machine learning,
and attack surfaces to study and analyze? in this literature review to conduct detailed examina-
• What is the impact of integral entities, including tion of attack vectors and attack surfaces in machine
adversary and targeted domain, in devising adversar- learning.
ial attacks to exploit the victim model/system? • Through deep analysis of threat landscape for
• What are the existing and most effective strategies machine learning systems and mitigation techniques
and solutions to examine to mitigate ML adversarial proposed in existing literature, open challenges and
attacks and their limitations concerning ML adver- future research directions have been identified to
sarial attack types? motivate further research and development in this
area.
The structure of the remaining manuscript is as fol- 2.1 Adversarial attack types‑based on model processing
lows: Section 2 explains various adversarial ML attacks. and development
Section 3 addresses existing surveys on adversarial 2.1.1 Poisoning attack
machine learning followed by description of our analy- Training a machine learning model with the pre-pro-
sis criteria in Section 4. Section 5 is the state-of-the-art cessed dataset is the initial development phase, which
analysis of existing research studies considering adversar- also allows adversaries to adversaries to poison it. Poi-
ial attack types, and Section 6 explored existing mitiga- soning attacks manipulate datasets by injecting falsified
tion strategies and their limitations. Section 7 identified samples or perturbing the existing data samples to infect
potential research directions, and Section 8 concludes the training process and mislead the classification at test
this research study. time. Poisoning the dataset is possible in two formats to
disrupt the labeling strategy of the victim model known
as label poisoning attack [26]. Feature perturbation, leav-
2 Adversarial machine learning ing the integrated label as is, is known as a clean-label
Machine learning is considerably used in automating poisoning attack [27]. The attack surface for poisoning
digital systems [23, 24], which makes it a tempting tar- attacks on machine learning is highlighted in Fig. 2.
get for adversaries to attack and potentially harm the
interconnected systems. These security violations origi- 2.1.2 Evasion attack
nated in a distinctive domain associated with the secu- Attacking the machine learning model at test time is
rity of machine learning known as adversarial machine called an evasion attack. This attack intends to mislead
learning [25]. Adversarial machine learning deals with the testing data to reduce the testing accuracy of the tar-
malicious attempts to exploit vulnerabilities in machine geted model [28]. The ultimate objective of this attack
learning. Every adversarial attempt is classified within is to misconstruct the testing input to harm the test-
one of the attack types: poisoning, evasion, model inver- time integrity of machine learning. Malware generative
sion, or membership inference attacks. The development recurrent neural network (MalRNN) is a deep learning-
of an adversarial attack focuses on many other factors, based approach developed to trigger evasion attacks on
including targeting significant processing phases, attack machine learning-based malware detection systems [29].
surfaces, capability, intention, knowledge of the adver- MalRNN evades three malware detection systems that
sary, and availability of the victim model. Based on the show the expedience of evasion attacks. In addition, this
machine learning development process, significant attack attack triggers the importance of reliable security solu-
types on machine learning are described as follows. tions to mitigate vulnerabilities in machine learning
against evasion attacks. Attack surface for evasion attacks Overall, machine learning model processing is at high
on machine learning is highlighted in Fig. 3. risk of adversarial attacks. Machine learning pertains to
several security and privacy vulnerabilities that exist and
2.1.3 Model inversion attack are exploitable at various layers of the machine learning
The objective of this attack is to disrupt the privacy of modeling process that must be addressed adequately to
machine learning. Model inversion attack is the type of mitigate adversarial attacks on machine learning models.
attack in which an adversary tries to steal the developed
ML model by replicating its underlying behavior, query- 2.2 Adversarial attack types‑based on knowledge
ing it with different datasets. An adversary extracts the of adversary
baseline model representation through a model inversion Adversarial attacks rely on the adversary’s knowledge of
attack and can regenerate the training data of the model. the ML model under attack. When designing an adver-
D. Usynin et al. [30] designed a framework for a model sarial attack, the adversary can have complete to zero
inversion attack on a collaborative machine learning knowledge of the target. The design of machine learning
model, demonstrating its success. It also highlights the adversarial attacks is highly dependent on the knowledge
impact of model inversion attacks on transfer machine of the adversary. Sub-categorizing adversarial attacks
learning models. Attack surface for model inversion based on the adversary’s knowledge is given as follows:
attacks on machine learning is highlighted in Fig. 4.
2.2.1 Black box attack
2.1.4 Membership inference attack Black box attack is an adversarial attack for which the
A membership inference attack is another privacy attack adversary has zero knowledge of the victim [31–33] that
that infers the victim model and extracts its training is put under attack. The targeted system is considered a
data, privacy settings, and model parameters. In this type black box for the adversary, which is the most realistic
of attack, the adversary has access to query the victim scenario because the adversary usually does not know
model under attack and can analyze the output gathered the target system. Threat models and attack vectors are
from the queried results. The adversary can regenerate considered untargeted with the adversary’s intention to
the training dataset of the targeted adversarial machine reduce the overall accuracy of the targeted model. Tar-
learning model by analyzing the gathered queried results. geted attacks can not be the scenario with the black box
The attack surface for membership inference attacks on attack model, as the adversary does not know the victim
machine learning is highlighted in Fig. 5. model to exploit it with a specific targeted attack vector.
Fig. 4 Model inversion attack surface in ML model development process-model inversion ML attack
Fig. 5 Membership inference attack surface in ML model development process-membership inference ML attack
2.2.2 Gray box attack this case, an adversary may have some knowledge either
When an adversary has partial knowledge of the target regarding the dataset, dataset distribution, or some
system, that kind of attack is called a gray box attack. In settings of the machine learning system that is to be
• Addressing limitations and enhancements of the pro- A. Shafee and T. A. Awaad [53] studied privacy threats
posed mitigation solutions to improve the privacy of in machine learning, specifically deep learning, highlight-
machine learning models ing issues and limitations in existing countermeasures.
Cryptographic and perturbation techniques are studied
The survey [51] provides a detailed synopsis of machine in detail. Homomorphic encryption, functional encryp-
learning poisoning attacks. Questions addressed in this tion, and secure multi-party computation protocols are
survey are the analysis of poisoning attack surface that analyzed, determining their effect on enhancing the pri-
leverages dataset poisoning to contaminate the training vacy of deep learning algorithms. Differential privacy is
process of machine learning and model poisoning, which studied across various deep learning model layers to ana-
manipulates the machine learning model processing. lyze its effectiveness in preserving privacy. The limita-
Poisoning attacks studies in this literature study are the tions of applying perturbation techniques and encryption
label flipping, p-tampering, and bi-level optimization on mechanisms to secure deep learning from adversarial
centralized machine learning, and gradient-based meth- privacy attacks are explained. The researchers have
ods and generative approaches, including feature colli- highlighted several open research directions that help
sion, are explored in the context of deep learning. Passive improve the privacy and confidentiality of deep learning
and active defense approaches are studied along with and should not downgrade the performance and applica-
their complexities and limitations to mitigate poisoning bility of deep learning algorithms.
attacks. In conclusion, limitations and further research
exploration directions are identified that should be con- 4 Criteria defined for literature analysis
sidered to address poisoning attacks in machine learning, We have conducted an in-depth literature review to ana-
which is still a significant and critical task to achieve. lyze the complexity and criticality of existing research
The survey [51] provides a detailed synopsis of machine studies. The specific criteria developed for this detailed
learning poisoning attacks. Questions addressed in this literature study are given in Fig. 6. Our comprehensive
survey are the analysis of the dataset and model poison- analysis criteria are designed based on adversarial attack
ing attack surfaces. Poisoning attacks in this literature types, which are further scaled down to study literature
study are the label flipping, p-tampering, and bi-level based on machine learning algorithms, datasets used to
optimization on centralized machine learning. Gradient- develop machine learning models, and the exploited vul-
based methods and generative approaches, including nerability of machine learning algorithms. Another entity
feature collision, are explored in deep learning. Passive to analyze the adversarial attack and its severity is the
and active defense approaches are studied along with adversary based on its knowledge and goals defined for
their complexities and limitations to mitigate poisoning the targeted attack on machine learning. At last, we have
attacks. In conclusion, further research directions are examined the adversarial attack severity and its impact
identified to address poisoning attacks in machine learn- based on the existing literature. Our developed crite-
ing (Table 1). ria for literature analysis are given in Fig. 6. A detailed
Table 1 Comparison of related existing surveys which are peer-reviewed and focusing adversarial machine learning attacks
Research paper Publication year Survey type Analysis of Analysis Analysis on (domain) Solutions Limitations
all attack criteria/ examined identified
types protocol
Fig. 6 Unique analysis criteria-developed to examine attacks technically w.r.t attack types
modular overview of our analysis criteria is given as ing these adversarial attacks. It helps analyze the
follows: impact of attacks from existing studies and compares
the complexity and implication of each adversarial
• Adversarial attack types. The base dimension of our attack type.
developed criteria is the adversarial attack types. • Identifying goals of adversary. Another significant
These attack types leverage us to analyze the process- dimension is the detailed synopsis of the adversary’s
level vulnerabilities of the machine learning model. goals and objectives set with the devised attack. Ana-
The attack types included for analysis are poisoning, lyzing the intention and goals of the adversary leads
evasion, model inversion, and membership inference to the justification of the severity of the adversarial
attacks, which can be further used to devise several attack. This dimension also helps technically and sys-
adversarial attacks based on these types. Analyzing tematically determine security violations in the tar-
existing studies based on these attack types, we have geted model or system.
comprehensively provided a thorough summation • Attack severity and impact with respect to existing
of adaptability, implication and comparison of these literature. After analyzing adversarial attacks with
attack types. the above mentioned dimensions, we comprehen-
• Machine learning model/algorithm. The machine sively determine the attack severity and impact on
learning algorithm/model is also an essential aspect the attacked model. Analyzing the attack severity will
of our analysis as it provides the technical interpre- provide us ground to study the complexity and prac-
tation of the attack design under study. It is consid- tical implication of adversarial attack types.
ered an influential factor in identifying the design
and complexity of adversarial attacks. Also, it helps
us to highlight the impact of attack type on individual 4.1 Literature review method
machine learning algorithms. For providing clear and significant state-of-the-art anal-
• Exploited vulnerability of machine learning algo- ysis, our selection process comprises of the key con-
rithm. Exploiting machine learning vulnerability is cepts, defined in Section 2. In total, four reviewers have
another essential factor in developing the attack vec- reviewed the selected papers and further refinement is
tor to manipulate the machine learning model. This particularly based on our inclusion criteria which is given
dimension helps in the technical assessment of the as follows.
attack success against the targeted machine learning The inclusion criteria are as follows: we have selected
model. Exact annotation of the breached vulnerabil- papers that are either peer-reviewed articles or confer-
ity helps analyze security issues in machine learning ence papers and should not go beyond 2017 as their pub-
algorithms and align research directions to address lication year. Each paper should focused on individual
these issues. adversarial attack against machine learning model and
• Knowledge level of adversary in devising adversarial provide the technical insights of the attack development.
attack. Analyzing the knowledge of the adversary of Also, for selecting mitigation solutions papers, we have
the targeted model helps us better understand the focused on the papers that provide the technical details
attack development. Knowledge of adversary scaled of the developed solution and their experimental results
from zero knowledge to completed knowledge of the when implemented against adversarial attack.
targeted model or system. The adversary’s knowledge The exclusion criteria are as follows: for further refine-
is considered an important benchmark when design- ment of the selected papers, we have excluded all the
papers that comprises of the comparative analysis of research studies based on the adversarial attack types,
various adversarial attacks on machine learning model or studying the victim model, adversary goals, capability,
does not provide the experimental results and insights of attack vector, threat model, exploited features of the
their developed attack and its impact on the targeted ML targeted model, and its impact. Concluding our exami-
model. nation process, we have provided detailed insights into
Based on the above defined inclusion and exclusion cri- the studied attack vector from a critical standpoint and
teria, we have developed our state-of-the-art dataset to in-depth forensics of the complete attack development
conduct our literature analysis. For capturing inter-rater process, highlighting and comparing the most threaten-
reliability of the reviewers, the Cohen’s Kappa scores for ing attack vectors exploiting various attack surfaces of
each of the reviewers are 0.90, 0.93, 0.80, and 0.84, sub- machine learning.
sequently. For the detailed adversarial machine learning This literature review examined considerable research
landscape analysis, keyword popularity is visible in Fig. 7 studies based on the above-developed criteria focusing
which highlight the impact of adversarial machine learn- adversarial attack types.
ing on various domains such as deep learning is highly Attacks examination, based on our analysis criteria,
inter-linked with adversarial attacks which is further allows us to interpret the complete development life
affected with membership inference and model inversion cycle of the adversarial attack. Studying attacks with
attacks. Also, poisoning attacks have impacted cyberse- the described dimensions reverse engineer the attack
curity, intrusion detection, and networks related applica- development, answering how different knowledge lev-
tions, whereas geographical distribution of the selected els help in exploiting targeting system and, also, what
papers is shown in Fig. 8 which highlight significant con- features can be exploited with various attack types. At
tributions of different countries in this domain. last, concluding the analysis of all the concerned enti-
ties, we have provided the impact and practicality of
4.2 Process of examining research studies various adversarial machine learning attacks. Based on
The process to examine existing literature is given in the criteria explained in Section 4, the filtered studies
Fig. 9. This literature study has extensively examined from literature are mentioned in Tables 2, 3, 4, and 5
for detailed analysis as part of this research study.
Fig. 8 Geographical distribution-an analysis of collaborative research landscape in adversarial machine learning
F. A. Yerlikaya et al. SVM, SGD, logistic Random label Poisoning dataset White box attack Reduce performance KNN and random No Model accuracy
[16], 2022 regression, random and distance-based by changing class (accuracy) of the sys- forest algorithms
forest, Gaussian NB, label flipping attacks labels with two effec- tem not much affected
K-NN tive strategies with label poisoning
Paracha et al. EURASIP Journal on Information Security
attacks
M. Jagielski et al. [47], Convolution neural Subpopulation Poisoned cluster Gray box attack Misclassification Subpopulation Yes Test time prediction
2020 networks attack is integrated as sub- targeted attack attacks are difficult
proportion of train- to detect and miti-
ing dataset gate specifically
in non-linear models
(2024) 2024:10
A. Demontis et al. SVM classifier, logis- Training time poison- Reduced gradient White box, black box Violate model’s Poisoning attacks Yes Model availability
[58], 2019 tic, ridge, SVM-RBF ing attack loss with poisoned attacks integrity and avail- are more effective
data points in trans- ability on models with large
ferable setting gradient space
and high complexity
C. Zhu et al. [14], Deep neural net- Feature collision Feature space Gray box attack Over fit target clas- Turning dropout Yes Test time misclassifi-
2019 works attack, convex poly- with perturbed train- sifier with poisoned during training cation
tope attack ing samples dataset with poisoned data
enhance transfer-
ability of poisoning
attack in deep neural
networks
M. Jagielski et al. [59], Linear regression Statistically based Distinguishing legiti- Mean and co- Misclassification Residual filtrating Yes Model accuracy
2018 regression points mate and poisoned variance dependent of the system mitigates poison-
poisoning gen- regression points gray box attack ing attack on linear
eration with flipped with minimal gradi- regression
labels ent loss
D. Gibert et al. [60], Generative adver- Query-free feature- Perturbed features Black box attack Evade ML detec- ML detectors are No Victim detection
2023 sarial networks based attack in executable tor with malicious vulnerable to be decision
executable evaded with query-
Paracha et al. EURASIP Journal on Information Security
free attacks
H. Yan et al. [61], 2023 Logistic regression, Label-based evasion Poisoned labeled Black box attack Transfer adversari- Transfer-based eva- No Test time precision
SVM, NB, decision attack samples ally crafted samples sion attack is a seri-
tree, RF, xgBoost, to evade ous threat to ML
ANN, ensemble and DL
model
(2024) 2024:10
H. Bostani et al. [62], ML-based malware n-gram based attack Transform malware Black box attack Misclassification DNN are more Yes Test time prediction
2022 detector on malware classifier samples into benign with model query of android malware affected by evading
with n-gram based access detector surrogate models
incremental strategy comparing to linear
SVM based classifier
Md. A. Ayub et al. Multi-layer percep- Jacobian-based sali- Iterative approach White box attack Misclassify malicious Multi-layer No Test time prediction
[28], 2020 tron network ency map attack to insert perturba- sample as benign perceptron can
tion near sensitive in IDS be exploited
feature of benign with evasion attack
samples with minimal
model’s knowledge
Y. Shi et al. [63], 2017 Naïve Bayes classifier Evasion attack Feed poisoned sam- Exploratory black Misclassify test data Controlled pertur- Yes Model availability
with feed-forward ples with DL score box attack samples bations to labels
neural networks under computed and classification
attack region boundary may limit
adversarial impact
on DL
T. Titcombe et al. [64], Split neural networks Model inversion Steal intermediate/ Black box attack Invert intermediate Model inversion Yes Model interception
2021 attack on distributed distributed data stolen data into input attacks are effective
ML from nodes in transfer format and dependent
learning on input dataset
Paracha et al. EURASIP Journal on Information Security
M. Khosravy et al. [65], Deep neural net- Images reconstruc- Regenerate model Gray box attack Inverted model ML is under serious No Model privacy
2021 works tion with MIA by intercepting and developed threat of MIA attack
private data of victim duplicate with partial knowl-
model by gathering edge of system
output
Q. Zhang et al. [66], Deep neural net- Stealing victim’s Sample regeneration White box attack Developed sur- ML model can Yes Model privacy
(2024) 2024:10
2020 works model classes helps to determine rogate model similar be inverted even
private data of vic- to the target if secured with differ-
tim’s model classes ential privacy
Z. He et al. [67], 2019 Deep neural net- Inverse-network Used un-trusted par- Black box, white box, Extract inference data Privacy preserva- Yes Model privacy
works attack strategy ticipant in collabora- and query-free inver- with an un-trusted tion is challenging
tive system sion attacks adversarial partici- to achieve in split
pant in collaborative DNN
network
S. Basu et al. [68], Deep neural net- Generative adversar- Extracted output White box attack Extract model class/ Machine learning can No Model accuracy
2019 works ial network approach from targeted net- inference details be inverted with gen-
work with generative by replicating erative samples
inference details generative adversarial
network
U. Aïvodji et al. [69], Deep neural net- Query-based gen- Extract model details Black box attack Breach privacy Differential privacy No Model privacy
2019 works erative adversarial by interpreting que- of Convolutional neu- is not much effective
network ried outputs ral networks (CNN) to mitigate MIA
on machine learning
Z. Zhu et al. [70], 2023 Multi-layer percep- MIA on sequential Surrogate Black box attack Infer user recommen- Inferring sequential Yes Dataset inference
tron recommendation and shadow models dations recommendations
system are designed leads to provide per-
to extract recommen- sonalized details
dations
J. Chen et al. [71], Lasso regression, CNN MIA with shadow Shadow model White box attack Retrieve confiden- Differential privacy No Model inference
(2024) 2024:10
2021 model is used to mimic tial details of target mitigates MIA com-
ground truth model promising accuracy
of model
M. Zhang et al. [72], Neural networks- Inference attack Adversarial model Black box attack Retrieve private Popularity randomi- Yes Model privacy
2021 based recommenda- to extract user-level is developed details of victim zation is effective
tion system details with theft users’ model against MIA in recom-
private data mender system
Y. Zou et al. [73], 2020 Deep neural net- Transfer learning- No privacy-preserved Black box attack Infer training model Transfer machine Yes Model inference
works based black box in transfer learning details with three learning is at serious
attack model formulated attacks threat of MIA
J. Jia et al. [74], 2019 Neural network MIA against binary Interpret output Black box attack Retrieve private train- Existing solutions are No Dataset inference
classifier confidence score ing data of classifier subjective to dataset
to manipulate model used in classifier
details
highlights the threat of transferring poison from the sur- with the target dataset. Concluding this research has for-
rogate to the victim model. A gradient-based optimiza- mulated improvements in the transferability of poisoning
tion framework is developed to transfer poison that attacks by turning on the dropout rate and implement-
manipulates the gradient of input samples of training and ing convex polytope objectives in multiple layers of neu-
testing datasets. This study has analyzed and highlighted ral networks. This research enforces the need to secure
the security vulnerabilities in transfer machine learning, machine learning, specifically neural networks, from poi-
proving empirically. This research study identifies major soning attacks in various adversarial settings.
factors that breach integrity, making the poisoning and The research study [59], particularly forensic security
evasion attack successful in transfer machine learning: vulnerabilities and defense solutions of linear regression,
the attacker’s optimization objectives, gradient alignment focuses on poisoning attacks on linear regression models
of surrogate and target model, and model complexity. with gradient-based optimization and statistical attack
C. Zhu et al. [14] have also demonstrated the trans- strategies. This study has proposed a new optimiza-
ferability of poisoning attacks in machine learning by tion framework to poison linear regression in a gray box
implementing polytope attacks in deep neural networks. attack setting by evaluating the limitations of existing
The impact of the poisoning attack is explained in this attacks. Another statistical-based poisoning attack is
clean-label poisoning attack in which the adversary has also introduced in this study, which maximizes loss by
poisoned only 1% of the training dataset, disrupting the introducing poisonous points at the very corner of the
results by 50%. Convex polytope attack is implemented boundary by exploiting the security of noise-resilient and
on various deep neural networks as case studies in this adversarially-resilient regression. However, TRIM has
research showed the sustainability and consequence of been proposed, proving to be more effective in mitigat-
poisoning attack in transfer machine learning. This study ing poisoning attacks in the linear regression model but
confirmed the reliability and effectiveness of a convex ineffective against subpopulation attack, thus proving the
polytope attack, comparing it with a feature collision severity of the poisoning attack in adversarial settings.
attack. Also, it demonstrated the success and sustain-
ability of transferability of convex polytope attack even 5.2 Evasion attack
in black box setting where the adversary does not know Malware classifiers are also affected by adversarial
the dataset of the victim model and still achieves almost attacks. In study [62], the researchers have developed
the same results as when the adversary has a 50% overlap a test time attack on an Android malware classifier to
disrupt its classification outcome. The attack devel- confidential tuning parameters are extracted, specifi-
oped in this research is a black box, which extracts the cally with a regularized maximum likelihood estimation
opcodes with the n-Grams strategy from the disassem- technique in which the adversary follows the Euclidean
bled Android application packages (APK) and trans- distance estimation and finds the optimal sample with
forms benign samples into malicious ones with a random the least variation. In conclusion, this research high-
search technique. This attack is experimented with five lighted the potential of inference attacks that demand
different malware detectors. It proves the effectiveness of attention to mitigate and ensure privacy preservation of
a test time attack that evades the machine learning model deep learning. S. Basu et al. [68] demonstrated the pri-
and misclassifies the test time classification results. As vacy issues in machine learning algorithms by inverting
a result, machine learning-based malware detectors, a deep neural network (DNN) with a model inversion
including Drebin, detection malware in android (MaM- attack. This research study implemented the model inver-
aDriod), with an accuracy of 81% and 75%, respectively, sion attack on the facial recognition system and extracted
and some others failed to detect malicious Android the class representation of the model. The attack devel-
applications. oped in this research has baseline knowledge of the tar-
Similarly, the stealthiness of the evasion attack is get system. A generative adversarial network is integrated
explained by another attack named the Jacobian-based to generate input samples and invert the victim model,
saliency map attack (JSMA). JSMA is developed against highlighting the effectiveness of generative AI in invert-
IDS and is designed on a multi-layer perceptron algo- ing the model. Another framework, named generative
rithm. Targeted misclassification is intended to be adversarial model inversion (GAMIN), by U. Aivodji and
achieved when the adversary has intended to classify others [69] is also based on generative adversarial net-
malware traffic in network intrusion detection sys- works to craft adversarial images to query the targeted
tems (NIDS) as benign. The experimental analysis uses model and extract its details by comparative output
the white box setting to devise this evasion attack and resemblance. The major threat disclosed with adversar-
achieved maximum accuracy drop to 29% with the ial networks is that even without prior knowledge of the
TRabID 2017 dataset. Hence, it proved the malignant system under attack, the adversary can extract its con-
approach to threat machine learning-based applications fidential settings parameters and invert it. M. Khosravy
in cybersecurity, subsequently highlighting the test time et al. [75] also developed a model inversion attack on a
security vulnerabilities in neural networks. deep neural network-based face recognition system. It is
Based on our devised criteria, we have also exam- a gray box attack as the adversary has partial knowledge
ined the sensitivity of evasion and causative attacks [63] of the system under attack, including the model structure
against deep learning to technically shed light on the and its parameters. This attack extracts the model con-
existing security vulnerabilities in deep learning that figurations by reconstructing images based on the confi-
can be exploited by adversaries to harm the system. This dence achieved from the targeted model, hence inverting
research devised an adversarial perturbation approach the targeted CNN model. Concluding all the mentioned
and tested it with text and image datasets. At first, an attacks, the emphasis is on the privacy preservation of
evasion attack is performed, followed by the exploratory machine learning, which is a primary consideration in
attack intended to infer the trained classification model constructing trustworthy and resilient AI/ML that over-
and extract its private tuning parameters. The explora- come adversarial attacks.
tory attack is a black box query-based attack replicating
the victim model based on the obtained query outputs. 5.4 Membership inference attack
With the replicated model, this attack is further intended The membership inference attack (MIA) is another pri-
to poison labels of testing samples and fool the deep vacy risk to machine learning and deep learning. Yang
learning model with an evasion attack. Zou et al. [73] have comprehensively studied mem-
bership inference attacks on deep learning models in
5.3 Model inversion attack transfer learning mode. 95% accuracy, area under curve
Adversarial attacks also threaten the privacy of machine (AUC), is achieved with the membership inference attack
learning. Research study [67] experimentally revealed the performed to determine if the input instance is part of
privacy attack during inference in collaborative machine the training dataset of the targeted model. Three attacks
learning and argued that a single malicious participant originated in three different transfer learning modes as
could infer the target system and steal the confidential part of this research. When the adversary has access to
information of the targeted system. This attack is suc- the teacher model, the adversary targets the trained stu-
cessful in all three settings with complete knowledge, dent model, and the adversary infers the teacher model
zero knowledge, and query-free attack setting. The dataset with access to the student model. This study
implemented a surrogate model based on ResNet20 solutions are subjective and attack-focused, which can-
convolutional neural networks with derived and student not guard targeted models when attacked with new tech-
datasets and determined the membership inference of niques. Also, proposed security solutions have several
the victim model. This attack vector is quite adequate in limitations that should be considered to keep the integ-
demonstrating the capability of the inference attack on rity of machine learning intact, making AI/ML secure
machine learning to exploit its privacy even with lim- and trustworthy. A hierarchical description of analyzed
ited access or information of the victim model. Another mitigation techniques, based on adversarial attack types,
potential privacy attack is mentioned in [72], where the is given in Fig. 11. A detailed analysis of existing security
attacker acquired an automated recommender system solutions based on adversarial attack types is given as
membership inference. The attack is declared zero- follows:
knowledge. However, this study interrogates a serious
privacy threat on the recommender system’s sensitive 6.1 Mitigating poisoning attack
user data, which adversaries can reveal with the deter- 6.1.1 Data sanitization
mined query-based attack. Here, the inference attack is Pre-processing training datasets and removing erroneous
defined by three recommender algorithms: item-based or poisoned data points is known as data sanitization.
collaborative filtering, latent factor model, and neural However, this reduction may lead to a lessened dataset,
collaborative filtering by implementing a shadow model increasing underfitting issues in model development. S.
to mimic the training dataset of the victim, which ulti- Venkatesan et al. [76] proposed a solution to overcome
mately jeopardizes its privacy. the limitations of data sanitization by creating random
training data subsets to train ten ensemble classifiers to
6 Mitigation strategies and limitations balance the poisoning effect. This mechanism reduces
Various mitigation techniques are also developed to poisoning effects while training NIDS to 30%. Similarly,
secure machine learning models alongside the above- another data sanitization derivative is applied to mal-
mentioned adversarial attacks. However, the existing ware detection systems to mitigate clean label poisoning
attacks [77]. This approach is an enhancement, provided to 80%, but it is still only effective against label-flipping
in [76]. Furthermore, the study [78] has proposed another backdoor attacks.
approach to label sanitization to reduce the impact of
overfitting and underfitting issues, whereas P. W. Koh and 6.2 Mitigating evasion attack
others [79] have introduced three sophisticated poison- 6.2.1 Adversarial training
ing attacks by introducing cluster-based poisoning that Adversarial training is a prominent mechanism to miti-
breached the sanitization solutions, highlighted above. gate evasion attacks in machine learning. A particular
RONI is also a derivation of data sanitization proposed dataset part is intentionally poisoned to lessen the test
by Patrick P. K. Chan and others [80], which removes time evasion and make the model adversarially robust
poisoned data samples by analyzing the negative impact [85]. It allows the victim to be aware of adversarial sam-
of each data sample that reduces classification accuracy. ples if injected at test time to detect and defend itself if
However, it also leads to underfitting issues that lessen attacked by an adversary. U. Ahmed et al. [86] proposed
the flexibility and increase true negatives at test time. adversarial training by classifying adversarial and nor-
mal data samples followed by centroid-based cluster-
6.1.2 Adding adversarial noise to data samples ing of features and calculating the cosine similarity and
Training the ML model with an adversarially developed centroid of the image vector. The research [87] to train
dataset allows the trained model to identify poisoned independent models to reduce fabricated classification
samples at test time. T. Y. Liu et al. [81] have boosted attacks and [88] guards against Carlini and Wagner and
the immunity of the model by adding specifically crafted FGSM attacks.
noise samples in the dataset during training, which is
effective against bulls-eye polytope, gradient masking 6.2.2 Model hardening
and sleeper agent attacks. Another study [82] has intro- The hardening machine learning model also applies to
duced adversarial noise into the intermediate layer of developing a wall of security in machine learning against
CNN to mitigate FGSM attacks. adversarial attacks at test time. Evasion attacks are also
mitigated with the help of a training model until they
reach the state of hardening, which activates the model
6.1.3 Adversarial training to evade adversaries and mitigate attack impact. Adver-
Training an ML model with adversarial data samples sarially crafted samples are injected intentionally during
allows it to be resilient against poisoning attacks. TRIM the machine learning model training to evade the system
is one of the techniques used to adversarially train mod- until it reaches the state of hardening, making the victim
els with a residual subset of a dataset with a minimum model resilient and robust. These poisoned input data
error rate. M. Jagielski and others [59] have designed and samples evade the system and are then marked as poi-
experimented with this TRIM algorithm against adver- soned in the system to identify similar patterns if injected
sarial poisoning attacks against linear regression algo- by the adversary at test time. G. Apruzzese et al. [89] has
rithm to solve optimization problems. This approach introduced a similar strategy to mitigate evasion attacks
has reduced the error rate to approximately 6%. It per- in botnet detection systems by deep reinforcement learn-
forms robustly compared to random sample consensus ing. They have developed an agent based on deep rein-
(RANSAC), a data sanitization derivative, whereas TRIM forcement learning capable of generating adversarial
and RONI security techniques failed against the subpop- samples to evade the targeted botnet and then including
ulation attack developed in [47]. these adversarially generated samples into the targeted
system marked as malicious to make the model under-
6.1.4 Model hardening stand the pattern of adversarial samples if attacked dur-
Another innovative technique to mitigate poisoning ing test time, whereas research study [90] used model
attacks is model hardening, in which the model is trained hardening to secure ML-based IoT system. A thresh-
until it leads to large class distances where it should not old is specified that trains the model properly with the
accept outliers. This technique makes it challenging for legitimate and illegitimate dataset that makes the botnet
an adversary to poison the model. G. Tao et al. [83] pro- detector robust against evasion attacks.
posed a model hardening mechanism with additional
training to increase the class distances and challenge the 6.2.3 Region‑based classification
label-flipping attack. The study [84] hardens random for- X. Cao et al. [91] have designed a classification mecha-
est algorithm to mitigate poisoning impact on an IDS. nism based on region rather than individual sample
Moreover, it also leads to mitigate backdoor attacks points. The researchers provided this technique based
against neural networks. It reduces misclassification up on the assumption that the adversarial points lie near the
classification boundary. A hypercube-centered classifi- is only tested for securing neural networks under the
cation approach is determined by omitting single-point- black box attack settings.
based classification at test time to reduce the impact of
adversarial points.
6.3.3 Pre‑training
Z. Chen et al. [96] have proposed a model-preserv-
6.3 Mitigating model inversion and membership inference
ing framework to preserve the security of deep learn-
attacks
ing models while training models by combining model
6.3.1 Differential privacy and sparsity
parameters and training data. Z. Chen et al. [97] have
To preserve the privacy of machine learning models, one
introduced a new framework to pre-train an ML-based
of the profound solutions is differential privacy. It makes
model to preserve privacy by enforcing less confidence in
it difficult for the adversary to analyze the output and
the queried results between members and non-members.
extract the victim’s confidential information. J. Chen et al.
Z. Yang and others [98] have introduced another model
[71] have used differential privacy applied with stochastic
to statistically in-distinguish the confidence scores of
gradient descent on Lasso and CNN neural to preserve
members and non-members.
genomic data privacy. H. Phan et al. [92] improves DNN
robustness by implementing differential privacy with the
logarithmic relation between the privacy budget and the 7 Potential research directions‑AML attack types
accuracy of the targeted model. They have empirically Machine learning is at the edge of adversarial attacks that
analyzed genomic data for phenotype prediction with a threaten the security and privacy of machine learning.
white box attack, whereas Q. Zhang et al. [66] improves In this literature review, we have analyzed the existing
differential privacy by implementing it at class and sub- adversarial attacks on machine learning, their mitigation
class level, proving the minimal probability of model strategies, and limitations based on adversarial machine
inversion attack at dataset only. Class and sub-class level learning attack types. Based on these attack types, the
differential privacy is more effective and robust than following are the potential research directions that can be
simple record-level differential privacy, providing more extended as future research.
Euclidean distance between original and inverted data
samples. However, it is tested with neural networks only • To make machine learning safe and resilient against
with Face24 and MNIST datasets. Also, this type of dif- security attacks that disrupt its integrity, we need
ferential privacy requires high computational resources, to improve mitigation solutions and develop solu-
whereas the study [93] highlights trade-offs of data pri- tions that make machine learning secure by design
vacy and assuring its trustworthiness. K. Pan et al. [94] and robustify its model development process. Some
implemented differential privacy to mitigate privacy prominent solutions are highlighted in existing litera-
attacks and data leaks against generative adversarial net- ture but are subjective to mitigate vulnerabilities of a
works, whereas the floating-point attack mentioned in particular system to secure that specific domain or
[95] has invalidated differential privacy implemented to system environment but can not fight any new attack
preserve privacy of machine learning models. if implemented.
• Another important perspective towards trustworthy
6.3.2 Probability randomization machine learning is preserving its privacy against
Adversarial privacy attacks, specifically membership model inversion attacks [30, 75, 99, 100] and mem-
inference attacks, target machine learning classifiers and bership inference attacks [101–103] that violates its
infer input datasets by interpreting the confidence score secrecy. Both of these attacks need to be addressed to
and probability of the queried output. Adding noise to preserve the privacy of machine learning and make
the output or intentionally interrupting the confidence machine learning explainable and reliable to use.
probability score leads to the privacy preservation of • Identifying the reliability, integrity, and usability of
machine learning, preventing adversaries from infer- machine learning in security-sensitive applications
ring confidential details of the victim model. Member- such as financial recognition systems [104–106],
ship inference guard (MemGuard) [74] is one of the medical diagnostic applications [107, 108], medical
solutions designed to preserve the privacy of machine imaging systems [43, 109, 110], and cyber defenses
learning models against membership inference attacks [111–113] is potentially a critical and open research
by adding randomized noise to each of the score vectors challenge. The significance and prevalence of secure
with a specified probability of accuracy loss and makes and trustworthy machine learning are prominently
machine learning-based binary classifier resilient to miti- highlighted by its use in these domains. At the same
gate membership inference attack. However, the solution time, persisting threats and existing vulnerabilities
should be of greater concern to be resolved signifi- important to consider the security of the machine learn-
cantly to ensure the reliable use of machine learning. ing development process while developing mitigation
• Another important research direction is the practi- solutions to counter adversarial attacks.
cal implications of theoretical adversarial attacks on
machine learning. Many research studies and sur-
Authors’ contributions
veys claimed that most of the adversarial attacks are Anum Paracha-she is responsible for the development of the idea, conducting
highlighted in a theoretical manner [114] and maybe the literature analysis and writing the manuscript. Junaid Arshad-he is respon-
just implemented as white box attacks, which are less sible for the refinement of the idea and supervision of this literature analysis.
Also, he has reviewed and upgrade the writing of this manuscript. Mohamed
credible practically. The practical implication of these Ben Farah-he has contributed in reviewing the article. Khalid Ismail-he has
attacks and defenses is an open research challenge contributed in reviewing the article.
that should be particularly considered to highlight
Funding
the impact of adversarial attacks in reality. This work was supported by the U.K. West Midlands Innovation Accelerator
WMHTIA Grant 10056871.
Overall, many security and privacy-preserving solutions
Availability of data and materials
are provided in the literature. Still, to the best of our The compiled and refined data and results of this manuscript should be avail-
knowledge, security solutions and strategies given in the able upon request by the authors.
literature are very subjective in nature and target specific
attack vectors with limited datasets in particular domains Declarations
or systems to be implemented. Context-aware solutions
Ethics approval and consent to participate
against these adversarial attacks on machine learning are This literature analysis is solely based on the gathered data from the Scopus
a potential research challenge that should be focused on. for which no additional permissions are required.
Competing interests
8 Conclusion The authors declare no competing interests.
We have conducted a comprehensive study to analyze
different types of adversarial attacks, their development Received: 19 November 2023 Accepted: 2 April 2024
process, and their impact alongside defenses and limita-
tions. For the in-depth analysis, various aspects of mali-
cious attempts are studied, including the adversary’s
knowledge and accessibility, adaptations to algorithms,
vulnerability, and feature exploitation. Existing defense References
mechanisms are also studied to mitigate adversarial 1. R. Rosati, L. Romeo, G. Cecchini, F. Tonetto, P. Viti, A. Mancini, E. Frontoni,
From knowledge-based to big data analytic model: a novel iot and
attacks, including data sanitization, outlier detection, machine learning based decision support system for predictive mainte-
adversarial training, and differential privacy and sparsity. nance in industry 4.0. J. Intell. Manuf. 34(1), 107–121 (2023)
Moreover, their limitations and successful attacks that 2. B. Jothi, M. Pushpalatha, Wils-trs-a novel optimized deep learning based
intrusion detection framework for iot networks. Pers. Ubiquit. Comput.
breached these security techniques are highlighted to 27(3), 1285–1301 (2023)
provide a structured ground and deeper insights for fur- 3. A. Singh, S. Bhatt, V. Nayak, M. Shah, Automation of surveillance systems
ther investigations. Our study provides a detailed com- using deep learning and facial recognition. Int. J. Syst. Assur. Eng.
Manag. 14(Suppl 1), 236–245 (2023)
parative analysis of adversarial attack types, investigating 4. S. Gupta, P. Kumar, R.K. Tekchandani, Facial emotion recognition based
the significance of various technical aspects and provid- real-time learner engagement detection system in online learning
ing deeper insights into their development process. context using deep learning models. Multimedia Tools Appl. 82(8),
11365–11394 (2023)
Our analysis highlights the ability of adversaries to 5. D. Komarasamy, O. Duraisamy, M.S. S, S. Krishnamoorthy, S. Rajendran,
develop adversarial attacks to breach machine learning D.M. K, in 2023 7th International Conference on Computing Methodolo-
security and privacy. Poisoning attacks are identified as gies and Communication (ICCMC), Spam email filtering using machine
learning algorithm. IEEE. 1–5 (2023). https://www.ieee.org/conferences/
a major threat to machine learning, whereas practical publishing/index.html.
implications of inference attacks, such as attacks devel- 6. W.M. Salama, M.H. Aly, Y. Abouelseoud, Deep learning-based spam
oped in [21] against large language models, highlighted image filtering. Alex. Eng. J. 68, 461–468 (2023)
7. C. Chen, C. Wang, B. Liu, C. He, L. Cong, S. Wan, Edge intelligence
their impact. We have concluded that the public avail- empowered vehicle detection and image segmentation for autono-
ability of the datasets and models gives provenance to the mous vehicles. IEEE Trans. Intell. Transp. Syst. 24, 13023–13034 (2023)
adversaries to exploit ML models even with zero knowl- 8. S. Feng, H. Sun, X. Yan, H. Zhu, Z. Zou, S. Shen, H.X. Liu, Dense reinforce-
ment learning for safety validation of autonomous vehicles. Nature
edge of the targeted models. Also, adversarial attacks are 615(7953), 620–627 (2023)
transferable, allowing adversaries to penetrate the tar- 9. S. Menon, D. Anand, Kavita, S. Verma, M. Kaur, N. Jhanjhi, R.M. Ghoniem,
geted model with the help of surrogate models. For exam- S.K. Ray, Blockchain and machine learning inspired secure smart home
communication network. Sensors 23(13), 6132 (2023)
ple, the attack developed in [14] is transferable. Also, it is
10. M.H. Rahman, T. Islam, M.M. Rana, R. Tasnim, T.R. Mona, M.M. Sakib, 30. D. Usynin, D. Rueckert, G. Kaissis, Beyond gradients: Exploiting
Machine Learning Approach on Multiclass Classification of Internet adversarial priors in model inversion attacks. ACM Trans. Priv. Secur.
Firewall Log Files. In 2023 International Conference on Computational 26,(3) 1–30 (2023)
Intelligence and Sustainable Engineering Solutions (CISES). 358-364 31. Y. Bai, Y. Wang, Y. Zeng, Y. Jiang, S.T. Xia, Query efficient black-box
(2023). IEEE. publisher address: https://www.ieee.org/conferences/ adversarial attack on deep neural networks. Pattern Recog. 133, 109037
publishing/index.html (2023)
11. K. Surendhar, B.K. Pandey, G. Geetha, H. Gohel, in 2023 IEEE 12th Interna- 32. M. Yu, S. Sun, in Proceedings of the AAAI Conference on Artificial Intelli-
tional Conference on Communication Systems and Network Technologies gence. Natural black-box adversarial examples against deep reinforce-
(CSNT). Detection of payload injection in firewall using machine learn- ment learning. The AAAI Press. 36, 8936–8944 (2022). https://aaai.org/
ing (IEEE, 2023), pp. 186–190 https://www.ieee.org/conferences/publi aaai-publications/
shing/index.html 33. C. Sun, Y. Zhang, W. Chaoqun, Q. Wang, Y. Li, T. Liu, B. Han, X. Tian,
12. O. Oyebode, J. Fowles, D. Steeves, R. Orji, Machine learning techniques Towards lightweight black-box attack against deep neural networks.
in adaptive and personalized systems for health and wellness. Int. J. Adv. Neural Inf. Process. Syst. 35, 19319–19331 (2022)
Hum. Comput. Interact. 39(9), 1938–1962 (2023) 34. H. Wang, S. Wang, Z. Jin, Y. Wang, C. Chen, M. Tistarelli, in 2021 16th IEEE
13. A. Shafahi, W.R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras, T. Gold- international conference on automatic face and gesture recognition (FG
stein, Poison frogs! targeted clean-label poisoning attacks on neural 2021). Similarity-based gray-box adversarial attack against deep face
networks. Adv. Neural Inf. Process. Syst. 31 6103–6113 (2018) recognition (IEEE, 2021), pp. 1–8 https://www.ieee.org/conferences/
14. C. Zhu, W.R. Huang, A. Shafahi, H. Li, G. Taylor, C. Studer, T. Goldstein, publishing/index.html
Transferable clean-label poisoning attacks on deep neural nets. In 35. N. Aafaq, N. Akhtar, W. Liu, M. Shah, A. Mian, Language model agnostic
International conference on machine learning. 7614-7623 2019. PMLR. gray-box adversarial attack on image captioning. IEEE Trans. Inf. Forensic
International Conference on Machine Learning (ICML). https://www. Secur. 18, 626–638 (2022)
proceedings.com/international-conference-on-machine-learning-icml/ 36. R. Lapid, M. Sipper, 2023. I See Dead People: Gray-box adversarial attack
15. M.A. Ramirez, S. Yoon, E. Damiani, H.A. Hamadi, C.A. Ardagna, N. Bena, on image-to-text models. arXiv preprint arXiv:2306.07591.
Y.J. Byon, T.Y. Kim, C.S. Cho, C.Y. Yeun, New data poison attacks on 37. W. Patterson, I. Fernandez, S. Neupane, M. Parmar, S. Mittal, S. Rahimi, A
machine learning classifiers for mobile exfiltration. arXiv preprint. white-box adversarial attack against a digital twin. (2022). arXiv preprint
arXiv:2210.11592. 2022 arXiv:2210.14018
16. F.A. Yerlikaya, Şerif Bahtiyar, Data poisoning attacks against machine 38. S. Agnihotri, S. Jung, M. Keuper, CosPGD: A unified white-box
learning algorithms. Expert Syst. Appl. 208, 118101 (2022) adversarial attack for pixel-wise prediction tasks. 2023. arXiv preprint
17. B. Pal, D. Gupta, M. Rashed-Al-Mahfuz, S.A. Alyami, M.A. Moni, Vulner- arXiv:2302.02213
ability in deep transfer learning models to adversarial fast gradient sign 39. D. Wu, S. Qi, Y. Qi, Q. Li, B. Cai, Q. Guo, J. Cheng, Understanding and
attack for covid-19 prediction from chest radiography images. Appl. Sci. defending against white-box membership inference attack in deep
11(9), 4233 (2021) learning. Knowl. Based Syst. 259, 110014 (2023)
18. T. Combey, A. Loison, M. Faucher, H. Hajri, Probabilistic jacobian-based 40. A. Guesmi, K.N. Khasawneh, N. Abu-Ghazaleh, I. Alouani, in 2022 Inter-
saliency maps attacks. Mach. Learn. Knowl. Extraction 2(4), 558–578 national Joint Conference on Neural Networks (IJCNN). Room: Adversarial
(2020) machine learning attacks under real-time constraints (IEEE, 2022), pp.
19. R. Wiyatno, A. Xu, Maximal jacobian-based saliency map attack. arXiv 1–10 https://www.ieee.org/conferences/publishing/index.html
preprint arXiv:1808.07945. 2018 41. E. Abdukhamidov, M. Abuhamad, G.K. Thiruvathukal, H. Kim, T.
20. A.N. Bhagoji, W. He, B. Li, D. Song, Exploring the space of black-box Abuhmed, Single-Class Target-Specific Attack against Interpretable
attacks on deep neural networks. arXiv preprint arXiv:1712.09491. 2017 Deep Learning Systems. (2023). arXiv preprint arXiv:2307.06484.
21. A. Zou, Z. Wang, N. Carlini, M. Nasr, J.Z. Kolter, M, Fredrikson Universal 42. W. Feng, N. Xu, T. Zhang, Y. Zhang, in Proceedings of the IEEE/CVF Confer-
and transferable adversarial attacks on aligned language models. arXiv ence on Computer Vision and Pattern Recognition. Dynamic generative
preprint arXiv:2307.15043. 2023 targeted attacks with pattern injection. IEEE. 16404–16414 (2023).
22. X. Gong, Y. Chen, W. Yang, H. Huang, Q. Wang, b3: Backdoor attacks https://www.ieee.org/conferences/publishing/index.html
against black-box machine learning models. ACM Trans. Privacy Secur. 43. M.K. Puttagunta, S. Ravi, C. Nelson Kennedy Babu, Adversarial examples:
26, 1–24 (2023) attacks and defences on medical deep learning systems. Multimedia
23. A. Awajan, A novel deep learning-based intrusion detection system for Tools Appl. 82, 1–37 (2023)
iot networks. Computers 12(2), 34 (2023) 44. A. Zafar, et al., Untargeted white-box adversarial attack to break into
24. H. Shah, D. Shah, N.K. Jadav, R. Gupta, S. Tanwar, O. Alfarraj, A. Tolba, M.S. deep leaning based covid-19 monitoring face mask detection system.
Raboaca, V. Marina, Deep learning-based malicious smart contract and Multimedia Tools Appl. 83, 1–27 (2023)
intrusion detection system for iot environment. Mathematics 11(2), 418 45. B. Chen, Y. Feng, T. Dai, J. Bai, Y. Jiang, S.T. Xia, X. Wang, Adversarial exam-
(2023) ples generation for deep product quantization networks on image
25. D. Rios Insua, R. Naveiro, V. Gallego, J. Poulos, Adversarial machine learn- retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 1388–1404 (2023)
ing: Bayesian perspectives. J. Am. Stat. Assoc. 118, 1–12 (2023) 46. Y. Li, Z. Li, L. Zeng, S. Long, F. Huang, K. Ren, Compound adversarial
26. P. Gupta, K. Yadav, B.B. Gupta, M. Alazab, T.R. Gadekallu, A novel data examples in deep neural networks. Inf. Sci. 613, 50–68 (2022)
poisoning attack in federated learning based on inverted loss function. 47. M. Jagielski, G. Severi, N.P. Harger, A. Oprea, Subpopulation data poison-
Comput. Secur. 130, 103270 (2023) ing attacks. In Proceedings of the 2021 ACM SIGSAC Conference on
27. B. Zhao, Y. Lao, in Proceedings of the AAAI Conference on Artificial Intel- Computer and Communications Security. Association for Computing
ligence. Clpa: Clean-label poisoning availability attacks using generative Machinery. New York. (2021). 3104-3122 https://dl.acm.org/doi/proce
adversarial nets. The AAAI Press. 36, 9162–9170 (2022). https://aaai.org/ edings/10.1145/3460120
aaai-publications/ 48. I. Rosenberg, A. Shabtai, Y. Elovici, L. Rokach, Adversarial machine learn-
28. M.A. Ayub, W.A. Johnson, D.A. Talbert, A. Siraj, in 2020 54th Annual Con- ing attacks and defense methods in the cyber security domain. ACM
ference on Information Sciences and Systems (CISS). Model evasion attack Comput. Surv. 54, (5) 1–36 (2021)
on intrusion detection systems using adversarial machine learning. 49. M. Goldblum, D. Tsipras, C. Xie, X. Chen, A. Schwarzschild, D. Song, A.
IEEE. 1–6 (2020) https://www.ieee.org/conferences/publishing/index. Madry, B. Li, T. Goldstein, Dataset security for machine learning: Data
html poisoning, backdoor attacks, and defenses. IEEE Transactions on Pattern
29. M. Ebrahimi, N. Zhang, J. Hu, M.T. Raza, H. Chen, Binary Black-box Eva- Analysis and Machine Intelligence. 45,(2) 1563-1580 (2022)
sion Attacks Against Deep Learning-based Static Malware Detectors 50. M. Rigaki, S. Garcia, A survey of privacy attacks in machine learning.
with Adversarial Byte-Level Language Model. In 2021 AAAI workshop ACM Computing Surveys. 56, (4) 1-34 (2023)
on Robust, Secure and Efficient Machine Learning (RSEML). The AAAI 51. Z. Wang, J. Ma, X. Wang, J. Hu, Z. Qin, K. Ren, Threats to training: A survey
Press. 2021. https://aaai.org/conference/aaai/aaai21/ws21workshops/ of poisoning attacks and defenses on machine learning systems. ACM
Comput. Surv. 55, (7) 1–36 (2022)
52. N. Pitropakis, E. Panaousis, T. Giannetsos, E. Anastasiadis, G. Loukas, A 72. J. Chen, W.H. Wang, X. Shi, Differential privacy protection against
taxonomy and survey of attacks against machine learning. Comput. Sci. membership inference attack on machine learning for genomic data.
Rev. 34, 100199 (2019) In BIOCOMPUTING 2021: Proceedings of the Pacific Symposium. World
53. A. Shafee, T.A. Awaad, Privacy attacks against deep learning models and Scientific Publishing Company. 26–37 (2020) https://www.proceedings.
their countermeasures. J. Syst. Archit. 114, 101940 (2021) com/58564.html
54. P. Bountakas, A. Zarras, A. Lekidis, C. Xenakis, Defense strategies for 73. M. Zhang, Z. Ren, Z. Wang, P. Ren, Z. Chen, P. Hu, Y. Zhang, Membership
adversarial machine learning: A survey. Comput. Sci. Rev. 49, 100573 inference attacks against recommender systems. In Proceedings of the
(2023) 2021 ACM SIGSAC Conference on Computer and Communications
55. N. Martins, J.M. Cruz, T. Cruz, P. Henriques Abreu, Adversarial machine Security. New York. (2021) https://dl.acm.org/doi/proceedings/10.1145/
learning applied to intrusion and malware scenarios: A systematic 3460120
review. IEEE Access 8, 35403–35419 (2020) 74. Y. Zou, Z. Zhang, M. Backes, Y. Zhang, Privacy analysis of deep learning
56. G.R. Machado, E. Silva, R.R. Goldschmidt, Adversarial machine learning in the wild: Membership inference attacks against transfer learning.
in image classification: A survey toward the defender’s perspective. arXiv preprint arXiv:2009.04872. (2020)
ACM Comput. Surv. 55, (1) 1–38 (2021) 75. M. Khosravy, K. Nakamura, Y. Hirose, N. Nitta, N. Babaguchi, Model inver-
57. A. Alotaibi, M.A. Rassam, Adversarial machine learning attacks against sion attack by integration of deep generative models: Privacy-sensitive
intrusion detection systems: A survey on strategies and defense. Fut. face generation from a face recognition system. IEEE Trans. Inf. Forensic
Internet 15,(2) 62 (2023) Secur. 357–372 (2022)
58. A. Demontis, M. Melis, M. Pintor, M. Jagielski, B. Biggio, A. Oprea, C. Nita- 76. S. Venkatesan, H. Sikka, R. Izmailov, R. Chadha, A. Oprea, M.J. de Lucia, in
Rotaru, F. Roli, On the intriguing connections of regularization, input MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM).
gradients and transferability of evasion and poisoning attacks. arXiv Poisoning attacks and data sanitization mitigations for machine learn-
preprint arXiv:1809.02861. (2018) ing models in network intrusion detection systems (2021), pp. 874–879
59. M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, B. Li, Manipulating 77. S. Ho, A. Reddy, S. Venkatesan, R. Izmailov, R. Chadha, A. Oprea, in
machine learning: Poisoning attacks and countermeasures for regres- MILCOM 2022 - 2022 IEEE Military Communications Conference (MILCOM).
sion learning. In 2018 IEEE symposium on security and privacy (SP). Data sanitization approach to mitigate clean-label attacks against
IEEE. 19–35 (2018) malware detection systems. IEEE. 993–998 (2022). https://www.ieee.
60. D. Gibert, J. Planes, Q. Le, G. Zizzo. Query-free evasion attacks against org/conferences/publishing/index.html
machine learning-based malware detectors with generative adversarial 78. A. Paudice, L. Mu noz-González, E.C. Lupu, in ECML PKDD 2018 Work-
networks (2023) shops: Nemesis 2018, UrbReas 2018, SoGood 2018, IWAISe 2018, and Green
61. H. Yan, X. Li, W. Zhang, R. Wang, H. Li, X. Zhao, F. Li, X. Lin, A wolf in Data Mining 2018, Dublin, Ireland, September 10-14, 2018, Proceed-
sheep’s clothing: Query-free evasion attacks against machine learning- ings 18. Label sanitization against label flipping poisoning attacks
based malware detectors with generative adversarial networks. In (Springer, 2018), pp. 5–15 https://link.springer.com/book/10.1007/
2023 IEEE European Symposium on Security and Privacy Workshops 978-3-030-13453-2
(EuroS&PW) 21, 415-426 (2023) IEEE. https://ieeexplore.ieee.org/xpl/ 79. P.W. Koh, J. Steinhardt, P. Liang, Stronger data poisoning attacks break
conhome/10190553/proceeding data sanitization defenses. Machine Learning. 111, 1–47 (2022)
62. H. Bostani, V. Moonsamy, Evadedroid: A practical evasion attack on 80. P.P. Chan, Z.M. He, H. Li, C.C. Hsu, Data sanitization against adversarial
machine learning for blackbox android malware detection. Computers label contamination based on data complexity. Int. J. Mach. Learn.
& Security. 139, 103676–103693 (2024) Cybern. 9, 1039–1052 (2018)
63. Y. Shi, Y.E. Sagduyu, in MILCOM 2017 - 2017 IEEE Military Communications 81. T.Y. Liu, Y. Yang, B. Mirzasoleiman, Friendly noise against adversarial
Conference (MILCOM). Evasion and causative attacks with adversarial noise: a powerful defense against data poisoning attack. Advances in
deep learning. IEEE. 243–248 (2017). https://www.ieee.org/confe Neural Information Processing Systems. 35, 11947–11959 (2022)
rences/publishing/index.html 82. Z. You, J. Ye, K. Li, Z. Xu, P. Wang, in 2019 IEEE International Conference
64. T. Titcombe, A.J. Hall, P. Papadopoulos, D. Romanini, Practical defences on Image Processing (ICIP). Adversarial noise layer: Regularize neural
against model inversion attacks for split neural networks. arXiv preprint network by adding noise. IEEE. 909–913 (2019). https://www.ieee.org/
arXiv:2104.05743. (2021) conferences/publishing/index.html
65. M. Khosravy, K. Nakamura, Y. Hirose, N. Nitta, N. Babaguchi, Model inver- 83. G. Tao, Y. Liu, G. Shen, Q. Xu, S. An, Z. Zhang, X. Zhang, in 2022 IEEE Sym-
sion attack: Analysis under gray-box scenario on deep learning based posium on Security and Privacy (SP). Model orthogonalization: Class dis-
face recognition system. KSII Trans. Internet Inf. Syst. 15, 1100–1119 tance hardening in neural networks for better security. IEEE. 1372–1389
(2021) (2022). https://www.ieee.org/conferences/publishing/index.html
66. Q. Zhang, J. Ma, Y. Xiao, J. Lou, L. Xiong, in 2020 IEEE International 84. G. Apruzzese, M. Andreolini, M. Colajanni, M. Marchetti, Hardening
Conference on Big Data (Big Data). Broadening differential privacy for random forest cyber detectors against adversarial attacks. IEEE Trans.
deep learning against model inversion attacks. IEEE. 1061–1070 (2020). Emerg. Top. Comput. Intell. 4(4), 427–439 (2020)
https://www.ieee.org/conferences/publishing/index.html 85. M. Pawlicki, M. Choraś, R. Kozik, Defending network intrusion detection
67. Z. He, T. Zhang, R.B. Lee, Model inversion attacks against collabora- systems against adversarial evasion attacks. Futur. Gener. Comput. Syst.
tive inference. In Proceedings of the 35th Annual Computer Security 110, 148–154 (2020)
Applications Conference publisher address: Association for Computing 86. U. Ahmed, J.C.W. Lin, G. Srivastava, Mitigating adversarial evasion attacks
Machinery. New York. 148–162 (2019) https://dl.acm.org/doi/proce by deep active learning for medical image classification. Multimed.
edings/10.1145/3359789 Tools Appl. 81(29), 41899–41910 (2022)
68. S. Basu, R. Izmailov, C. Mesterharm, Membership model inversion 87. H. Rafiq, N. Aslam, U. Ahmed, J.C.W. Lin, Mitigating malicious adversaries
attacks for deep networks. arXiv preprint arXiv:1910.04257 (2019) evasion attacks in industrial internet of things. IEEE Trans. Ind. Inform.
69. U. Aïvodji, S. Gambs, T. Ther, Gamin: An adversarial approach to black- 19(1), 960–968 (2023)
box model inversion. arXiv preprint arXiv:1909.11835. (2019) 88. J. Lin, L.L. Njilla, K. Xiong, Secure machine learning against adversarial
70. Z. Zhu, C. Wu, R. Fan, D. Lian, E. Chen, in Proceedings of the ACM Web samples at test time. EURASIP J. Inf. Secur. 2022(1), 1 (2022)
Conference 2023. Membership inference attacks against sequential rec- 89. G. Apruzzese, M. Andreolini, M. Marchetti, A. Venturi, M. Colajanni, Deep
ommender systems (Association for Computing Machinery, 2023), pp. reinforcement adversarial learning against botnet evasion attacks. IEEE
1208–1219 https://www.ieee.org/conferences/publishing/index.html Trans. Netw. Serv. Manag. 17(4), 1975–1987 (2020)
71. J. Chen, W.H. Wang, X. Shi, Membership Inference Attacks Against 90. E. Anthi, L. Williams, A. Javed, P. Burnap, Hardening machine learning
Sequential Recommender Systems. In Proceedings of the ACM Web denial of service (dos) defences against adversarial attacks in iot smart
Conference publisher address: Association for Computing Machinery. home networks. Comput. Secur. 108, 102352 (2021)
New York. 1208–1219 (2023) https://dl.acm.org/doi/proceedings/10. 91. X. Cao, N.Z. Gong, in Proceedings of the 33rd Annual Computer Security
1145/3543507 Applications Conference. Mitigating evasion attacks to deep neural
networks via region-based classification. JMLR. 278–287 (2017). https:// against security systems based on machine learning, vol. 900 (IEEE,
jmlr.org/ 2019), pp. 1–18 https://ieeexplore.ieee.org/xpl/conhome/8751947/
92. H. Phan, M.T. Thai, H. Hu, R. Jin, T. Sun, D. Dou, in Proceedings of the proceeding
37th International Conference on Machine Learning. Scalable differential 112. A. Piplai, S.S.L. Chukkapalli, A. Joshi, in 2020 IEEE 6th Intl Conference on
privacy with certified robustness in adversarial learning, vol. 119 (PMLR, Big Data Security on Cloud (BigDataSecurity). IEEE Intl Conference on High
2020), pp. 7683–7694 JMLR.org Performance and Smart Computing,(HPSC) and IEEE Intl Conference on
93. M. Strobel, R. Shokri, Data privacy and trustworthy machine learning. Intelligent Data and Security (IDS), Nattack! adversarial attacks to bypass
IEEE Secur. Priv. 20(5), 44–49 (2022) a gan based classifier trained to detect network intrusion (IEEE, 2020),
94. K. Pan, M. Gong, Y. Gao, Privacy-enhanced generative adversarial pp. 49–54 https://www.ieee.org/conferences/publishing/index.html
network with adaptive noise allocation. Knowl. Based Syst. 272, 110576 113. A. Kuppa, N.A. Le-Khac, in 2020 International Joint Conference on neural
(2023) networks (IJCNN). Black box attacks on explainable artificial intelligence
95. J. Jin, E. McMurtry, B.I.P. Rubinstein, O. Ohrimenko, in 2022 IEEE Sympo- (xai) methods in cyber security (IEEE, 2020), pp. 1–8 https://www.ieee.
sium on Security and Privacy (SP). Are we there yet? timing and floating- org/conferences/publishing/index.html
point attacks on differential privacy systems. IEEE. 473–488 (2022). 114. E. Raff, M. Benaroch, A.L. Farris, You Don’t Need Robust Machine Learn-
https://www.ieee.org/conferences/publishing/index.html ing to Manage Adversarial Attack Risks. arXiv preprint arXiv:2306.09951
96. Z. Chen, J. Wu, A. Fu, M. Su, R.H. Deng, Mp-clf: An effective model- (2023)
preserving collaborative deep learning framework for mitigating data
leakage under the gan. Knowl. Based Syst. 270, 110527 (2023)
97. Z. Chen, K. Pattabiraman, Overconfidence is a dangerous thing: Mitigat- Publisher’s Note
ing membership inference attacks by enforcing less confident predic- Springer Nature remains neutral with regard to jurisdictional claims in pub-
tion. arXiv preprint arXiv:2307.01610 (2023) lished maps and institutional affiliations.
98. Z. Yang, L. Wang, D. Yang, J. Wan, Z. Zhao, E.C. Chang, F. Zhang, K. Ren,
in Proceedings of the AAAI Conference on Artificial Intelligence. Purifier:
Defending data inference attacks via transforming confidence scores.
The AAAI Press. 37, 10871–10879 (2023). https://aaai.org/aaai-publi
cations/
99. Z. Zhang, Q. Liu, Z. Huang, H. Wang, C.K. Lee, E. Chen, Model inversion
attacks against graph neural networks. IEEE Trans. Knowl. Data Eng.
35(9), 8729–8741 (2023)
100. T. Zhu, D. Ye, S. Zhou, B. Liu, W. Zhou, Label-only model inversion
attacks: Attack with the least information. IEEE Trans. Inf. Forensic Secur.
18, 991–1005 (2023)
101. Y. Liu, Z. Zhao, M. Backes, Y. Zhang, Membership inference attacks by
exploiting loss trajectory. In Proceedings of the 2022 ACM SIGSAC
Conference on Computer and Communications Security. Association
for Computing Machinery. New York. 2085–2098 (2022) https://dl.acm.
org/doi/proceedings/10.1145/3548606
102. L. Liu, Y. Wang, G. Liu, K. Peng, C. Wang, Membership inference attacks
against machine learning models via prediction sensitivity. IEEE Trans.
Dependable Secure Comput. 20(3), 2341–2347 (2023)
103. N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis, F. Tramèr, in 2022 IEEE Sym-
posium on Security and Privacy (SP). Membership inference attacks from
first principles. IEEE. 1897–1914 (2022). https://www.ieee.org/confe
rences/publishing/index.html
104. R.S. Siva Kumar, M. Nyström, J. Lambert, A. Marshall, M. Goertzel, A.
Comissoneru, M. Swann, S. Xia, in 2020 IEEE Security and Privacy Work-
shops (SPW). Adversarial machine learning-industry perspectives. IEEE.
69–75 (2020). https://www.ieee.org/conferences/publishing/index.html
105. M. Schreyer, T. Sattarov, B. Reimer, D. Borth, Adversarial learning of
deepfakes in accounting. (2019). CoRR abs/1910.03810
106. I. Fursov, M. Morozov, N. Kaploukhaya, E. Kovtun, R. Rivera-Castro, G.
Gusev, D. Babaev, I. Kireev, A. Zaytsev, E. Burnaev, Adversarial attacks on
deep models for financial transaction records. In Proceedings of the 27th
ACM SIGKDD Conference on Knowledge Discovery & Data Mining.
Association for Computing Machinery, New York, NY, United States.
2868–2878 (2021) https://dl.acm.org/doi/proceedings/10.1145/34475
48
107. A. Rahman, M.S. Hossain, N.A. Alrajeh, F. Alsolami, Adversarial examples-
security threats to covid-19 deep learning systems in medical iot
devices. IEEE Internet Things J. 8(12), 9603–9610 (2021)
108. X. Han, Y. Hu, L. Foschini, L. Chinitz, L. Jankelson, R. Ranganath, Deep
learning models for electrocardiograms are susceptible to adversarial
attack. Nat. Med. 26(3), 360–363 (2020)
109. X. Ma, Y. Niu, L. Gu, Y. Wang, Y. Zhao, J. Bailey, F. Lu, Understanding adver-
sarial attacks on deep learning based medical image analysis systems.
Pattern Recogn. 110, 107332 (2021)
110. H. Kim, D.C. Jung, B.W. Choi, Exploiting the vulnerability of deep learn-
ing-based artificial intelligence models in medical imaging: adversarial
attacks. J. Korean Soc. Radiol. 80(2), 259–273 (2019)
111. G. Apruzzese, M. Colajanni, L. Ferretti, M. Marchetti, in 2019 11th interna-
tional conference on cyber conflict (CyCon). Addressing adversarial attacks
1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at