IEEE 7 48 Pages
IEEE 7 48 Pages
IEEE 7 48 Pages
ABSTRACT Intelligent applications supported by Machine Learning have achieved remarkable performance
rates for a wide range of tasks in many domains. However, understanding why a trained algorithm makes
a particular decision remains problematic. Given the growing interest in the application of learning-based
models, some concerns arise in the dealing with sensible environments, which may impact users’ lives. The
complex nature of those models’ decision mechanisms makes them the so-called ‘‘black boxes,’’ in which the
understanding of the logic behind automated decision-making processes by humans is not trivial. Furthermore,
the reasoning that leads a model to provide a specific prediction can be more important than performance
metrics, which introduces a trade-off between interpretability and model accuracy. Explaining intelligent
computer decisions can be regarded as a way to justify their reliability and establish trust. In this sense,
explanations are critical tools that verify predictions to discover errors and biases previously hidden within
the models’ complex structures, opening up vast possibilities for more responsible applications. In this review,
we provide theoretical foundations of Explainable Artificial Intelligence (XAI), clarifying diffuse definitions
and identifying research objectives, challenges, and future research lines related to turning opaque machine
learning outputs into more transparent decisions. We also present a careful overview of the state-of-the-art
explainability approaches, with a particular analysis of methods based on feature importance, such as the
well-known LIME and SHAP. As a result, we highlight practical applications of the successful use of XAI.
INDEX TERMS Black-box models, explainability, explainable machine learning, interpretability, interpretable
machine learning.
computationally. Machine Learning has attracted considerable describable, and understandable choices produced by their
attention from the research community because of its ability ability to ‘‘think’’ [10]. Except for the final output, the
to accurately predict a wide range of complex phenomena [5]. interpretation of the reasoning behind a complex machine
Many learning-based algorithms have emerged in the last learning model is not easy, thus preventing its results from
decade, especially after 2012, owing to the significant being fairly understood [9]. However, if a decision cannot
reduction in data storage costs, thus increasing the amount of be directly interpreted, understandable elements that shed
information available through large datasets [6], improvements light on the opaque decision-making processes of the models
in hardware, especially Graphics Processing Units (GPUs) can be provided, thus making them explainable. In this
with high computational power, enabling the processing of sense, explainability can advance toward more transparency
large datasets in reasonable times, and new programming in complex models, providing elements of explanation of
languages and high-quality open-source libraries, which have the logic behind a prediction and debugging the simple
leveraged worldwide programmers for creating prototypes, presentation of output data. In the Machine Learning context,
running and testing models, and developing new optimized explainability can be can be considered a counterpart to the
algorithms [7]. decision-making rationalization of human thought [10].
The sophistication of learning-based algorithms has In general, all initiatives and efforts to reduce the complexity
increased up to the point they have achieved (above) human of learning-based models and improve both transparency
performances [6], [8], even surpassing human abilities in and understanding of their actions can be considered XAI
several computer tasks, including computer vision, image clas- approaches [2], [22]. XAI is a research area that leverages
sification, language processing, and pattern recognition [9], ideas from the social sciences. It also considers the psychology
[10], [11]. Some intelligent systems require almost no human of explanation to create techniques that make the outputs
intervention for tuning or training [2]; consequently, the of machine learning applications more understandable while
application of Machine Learning has been transformative, maintaining a high level of predictive performance, enabling
with intelligent models employed in most diverse contexts, humans to interpret, trust, and manage the next generations
from products, text documents, music, movies, and friend exposed to Artificial Intelligence [2], [23]. Interpretability
recommendations on social networks to decision-making in and explainability are closely related in supporting humans
critical fields such as medicine, financial markets, autonomous in understanding the reasoning behind a model’s predictions.
cars, government strategic planning, bioinformatics, and Although they are often used interchangeably in the literature,
criminal systems [2], [12], [13], [14], [15], [16]. On the other they are not monolithic concepts, and their precise and formal
hand, such complex models also draw attention to trust-related definitions remain subjective in the specialized literature.
problems, particularly when the outputs involve sensitive No consensual specifications for what an interpretable
contexts, such as in medical diagnosis. In this case, the reasons algorithm would be or a proper way to generate and evaluate
behind a decision must be known [4]. explanations have been reached [13].
The need to explain the behavior of non-interpretable
learning algorithms that can affect people’s lives is not only
A. THE TRANSPARENCY CHALLENGE a desirable property, but also a legal demand in some places.
As stated by Breiman [17] when defining Random Forests As an example, the European Union introduced the right to
algorithm, ‘‘a forest of trees is impenetrable as far as simple explanation in its General Data Protection Regulation (GDPR),
interpretations of its mechanism go.’’ Despite the high levels including algorithmic decision-making guidelines, to mitigate
of accuracy, the complex nature in which learning models the social impact of computational systems [24]. Among other
operate reduces the transparency of their decision processes, requirements, GDPR defines the right to information, i.e., the
turning them into so-called ‘‘black boxes’’ [18]. In other words, need for ‘‘meaningful explanations of the logic involved’’ in
modern learning algorithms suffer from opacity – describing automated decisions requiring ‘‘the controller must ensure the
the degree of impact of each part of the information provided right for individuals to obtain further information about the
as an input with respect to the corresponding output can be decision of any automated system’’ [25], [26].
challenging [19]. Delegating critical decisions to systems that GDPR started institutional discussions about more require-
cannot be interpreted or do not provide explanations about ments for compliance with Artificial Intelligence use. The
the logical path of their outputs can be dangerous, especially U.S. Food and Drug Administration (FDA) proposed a
in sensitive scenarios, such as healthcare, autonomous cars, regulatory framework for medical devices supported by
public security, and counter-terrorism [4], [20]. Therefore, Artificial Intelligence/Machine Learning [27]. The framework
‘‘interpretability’’ and ‘‘explainability’’ have emerged as new defines the need for submission to the FDA evaluation
concepts brought to the surface in the Explainable Artificial when continuous learning algorithms introduce changes that
Intelligence (XAI) research community. significantly affect a medical device’s performance. However,
Interpretable and explainable do not share the same implementing those requirements for product development is
meaning. The word ‘‘interpretable’’ can be defined as the still an open problem [28].
ability to present something in an understandable manner [21]. Similarly, the World Health Organization (WHO) released a
Humans can justify their actions through logically consistent, long guidance report on the Ethics & Governance of Artificial
80800 VOLUME 12, 2024
E. S. Ortigossa et al.: XAI—From Theory to Methods and Applications
Intelligence for Health [29]. The WHO document identifies helping researchers design comprehensive explanations. As a
ethical, trust, and transparency challenges in designing or result of our investigations, we report an overview of practical
deploying intelligent-based models applied in healthcare. applications where XAI has been successfully applied to turn
Specifically, the WHO guidance requires health ‘‘technologies opaque decisions into more transparent information.
should be intelligible or understandable to developers, In summary, the main contributions of this research are:
medical professionals, patients, users, and regulators,’’ with • A comprehensive discussion on XAI theory, including
explainability as the approach to improve transparency and motivations, terminology clarification, and objectives of
provide an understanding of why an intelligent system made explainability in Machine Learning.
a particular decision. • A concise review and taxonomic categorization of recent
In addition, the recently introduced California Consumer and widely used XAI methodologies.
Privacy Act (CCPA) [30] defines rights regarding use and • A presentation of the challenges, limitations, and
protection of personal information, which has influenced promising paths toward explainability evolution.
privacy legislation in the United States. Therefore, explaining • An in-depth review of feature attribution/importance
black-box decisions is now a legally mandatory desirable methods, including an analysis of the problems related
subject, motivating the recent explosion in XAI research to relying on Shapley-based explanations.
interest and development techniques [4], [31]. • A high-level discussion of cases from various domains
Explanations of the reasons that lead an intelligent model to where explainability has been successfully applied.
its discovered patterns, i.e., the reasoning behind predictions,
The remainder of the paper is organized as follows:
can be even more important than the predictive performance
Section II defines the research methodology and basic
itself [14]. In this sense, Explainable Artificial Intelligence
terminology used; Section III presents some previous
can add a new layer to the undeniable success of Machine
research; Section IV briefly overviews the evolution of
Learning, going beyond the usual performance metrics and
Machine Learning; Section V is devoted to an algorithmic
aiming to provide a direct understanding of the behavior of
complexity discussion; Section VI addresses the objectives of
learning models.
explainability; Section VII clarifies the theoretical foundations
of XAI and presents the needs, challenges, and a taxonomy;
Section VIII reviews recent approaches in the XAI domain and
B. CONTRIBUTIONS AND ORGANIZATION
Section IX provides examples of successful implementations
In recent years, XAI has become one of the most popular
of explainability; open problems and future research directions
subjects in Artificial Intelligence and Data Science commu-
are discussed in Section X; finally, Section XI presents our
nities, and explaining machine learning is essential, since
final remarks.
complex learning-based models are now part of our lives,
making decisions that may influence people’s interactions.
However, XAI is not yet a mature research domain, often II. BACKGROUND STATEMENTS
lacking formality in definitions and objectives [32]. In this We conducted a content investigation of published literature
paper, we investigate the multiple aspects of XAI, providing to understand the evolution of XAI over the last few years.
beginners and experts with the way the concepts of explain- Such research systematically evaluated the available scientific
ability translate into practical applications for understanding communication, clarified terminologies, described objectives,
machine-learning decisions. With a comprehensive study of identified fundamental contributions and applications, and
the XAI literature, we identified gaps and organized a detailed indicated future research opportunities. XAI is a research area
review of the theoretical foundations and objectives related to that emerged not long ago and still lacks some definitions and
explainability research. further discussions, addressed in this review.
In contrast to previous studies that presented a large number
of techniques, the present one discusses the latest and main A. METHOD OF THE SYSTEMATIC REVIEW
applications devoted to opening black-box problems from We combined four databases, namely, Association for
different perspectives (e.g., locality or model dependence) Computing Machinery (ACM) Digital Library, IEEE Xplore
and using different mechanisms (e.g., feature importance, Digital Library, Citeseer Library, and Elsevier’s Scopus, for
inspection, or counterfactuals), highlighting their operational a comprehensive search of XAI theory and applications
aspects, advantages, and limitations. We also carefully and search engines such as Google Scholar, Elsevier’s
reviewed feature importance explainability methods due to ScienceDirect, and Thomson Reuters’ Web of Science were
their leading position among XAI approaches [33], with a used in association with them.
detailed analysis of LIME and SHAP. Queries on terms ‘‘explainable,’’ ‘‘interpretability,’’
Our focus is on demonstrating the importance of XAI ‘‘explainability,’’ ‘‘black box,’’ ‘‘understandable,’’ and
tools for providing an additional layer of trust to automated ‘‘transparency’’ merged with ‘‘artificial intelligence’’ or
decision systems by detecting hidden biases and noises that ‘‘machine learning’’ mainly restricted to (but not only) the
can lead to unfair decisions. We also address the limitations of 2010–2024 period and based on publications’ title, abstract,
current XAI approaches and future research directions toward and keywords were performed. The following two criteria
were employed in the search results for selecting publications motivations of interpretability according to the literature at the
for further revision: moment. Doshi-Velez and Kim [21], Chakraborty et al. [10],
• Papers published in relevant peer-reviewed scien- and Došilović et al. [7] introduced the concepts and taxonomy.
tific journals as articles available online in English. Despite their valuable overviews introducing early advances
We extended the scope to conference proceedings, arXiv in XAI, the studies were seminal in terms of concepts,
e-prints, theses, and books. lacking clear definitions of interpretability and explainability.
• Previous studies explicitly employing explainable arti- Zhang and Zhu [9] reviewed XAI, restricting the research
ficial intelligence or explainable machine learning. on visualization strategies applied for Convolutional Neural
We excluded papers that only listed XAI in keywords, Networks (CNNs).
alluded to XAI, or applied some XAI method with Miller [22] conducted a broad survey on XAI, approaching
no discussions or a reference to the XAI methodology theories from human sciences such as cognitive and
employed. social psychology, which was foundational in relating
The queries returned 439 papers. We removed duplicates Artificial Intelligence and how humans explain decisions.
after fine-granulated filtering from abstract and introduction Guidotti et al. [26] analyzed several interpretable and
readings, carefully reviewed the remaining ones in full, and explainability methods; they categorized them according to
reported 296 references here. the problem type for which each XAI method was indicated
and described a detailed taxonomy from Data Mining and
B. BASIC TERMINOLOGY Machine Learning viewpoints. Despite formalizing important
A term often used in this research is ‘‘model,’’ which concepts of XAI and focusing on interpretability processes, the
can denote diverse meanings in different areas. Some authors only highlighted the need for evaluation metrics and
misunderstandings are still possible to occur, even when the did not discuss the elements of evaluation comprehensively.
scope is limited to Machine Learning. Although machine Similarly, Murdoch et al. [5] presented an updated conceptual
learning algorithms such as Artificial Neural Networks, SVMs, overview and Molnar [40] offered an extensive review of
or Random Forests are not models, they generate models both conceptual elements and characteristics of the main XAI
after training procedures. Therefore, the meaning of ‘‘model’’ approaches.
must be defined in this research. Whenever used here, it is Adadi and Berrada [4] and Arrieta et al. [2] conducted
employed as a simplified reference to some machine learning comprehensive research in the literature and presented detailed
algorithm or its generated (trained) model, which follows views of the XAI scenario from the fundamentals, contribu-
the usual terminology in XAI literature. In addition, when tions, and toward solutions for dealing with the different needs
discussing any aspect of XAI approaches applied to explain for explainability. Both publications address ethical concepts
learning models, we refer to a trained model. such as fairness (in the sense of impartiality) and compliance
The models addressed here are usually trained on in Machine Learning, differing in a critical aspect, with
multidimensional datasets (e.g., tabular data, time series, 2D Adadi and Berrada [4] introducing questions on (the lack of)
images, 3D point clouds, videos, or semantic segmentation), explainability evaluation and Arrieta et al. [2] distinguishing
which contain m individual instances composed of a collection transparent and post-hoc methods and suggesting guidelines
of characteristics formally expressed as for the development of socially responsible intelligent systems.
The two articles provided enriched discussions on XAI, but
X = {x1 , x2 , . . . , xm } (1)
described the leading strategies in general terms. Linardatos
where each vector x = (x1 , . . . , xn ) ∈ X is a data instance in et al. [32] defined a taxonomy of interpretability methods,
Rn . By convention, xi elements that characterize the instances concluding most XAI methods were proposed for tasks on
in such a dataset are called attributes, features, or variables. Neural Network models. The authors also included links to
On the other hand, multiple elements that compose machine code repositories with XAI implementations. Speith [41]
learning models updated during learning procedures, i.e., critically reviewed several commonly adopted taxonomies
those that vary when the model is trained, are called model of explainability methods, highlighting their similarities,
parameters or simply parameters. differences, and inconsistencies.
Tjoa and Guan [42] and Amann et al. [16] discussed the
III. PREVIOUS WORK concepts and applications of XAI; however, they concentrated
Research on XAI has introduced a wide variety of approaches their research on the explainability of black-box systems used
and methods so that several researchers have committed in medicine. The two later studies addressed important matters
to discussing the XAI environment for defining multiple on the risks of omitting clear explanations within medical
theoretical and practical particularities of techniques [34], [35] applications, with Amann et al. [16] highlighting the need
and metrics related to XAI [36], [37], [38], [39]. to fix the multiple XAI terminology and properly validate
Lipton [13] was one of the pioneers in organizing the explanations.
main definitions of interpretability in Machine Learning. Regarding specialized publications on explainability,
Although the final publication dated 2018, its first version was several recent studies have reviewed XAI for applications
available in 2016, compiling a discussion on the needs and in different domains where machine learning has been
used, highlighting medicine, specifically cardiology [43], Artificial Intelligence techniques are traditionally divided
breast cancer diagnosis and surgery [44], medical image into symbolic and connectionist. The symbolic (or classical)
analysis [45], radiology [46], and healthcare [28], [47], [48]. paradigm, prevalent until the 1980s, incorporates predicate
XAI reviews dedicated to other areas such as genetics [49], logic based on symbols and rules representing human
anomaly detection [50], automotive industry [51], [52], knowledge about a given problem. Symbols enable the
automation and smart industry [53], materials science [54], algorithm to establish a series of logical reasoning processes
and language processing [55] can also be found, proving similar to language. Symbolic representations have a
the interdisciplinary relevance of XAI research. Although propositional nature and define the existence of relationships
those specialized surveys provide valuable insights into trends, between objects, whereas ‘‘reasoning’’ develops new logical
advances, promises, and limitations of XAI under the view relationships supported by a set of inference rules [57]. Note
of domain experts, they focus on applications for specific the similarity with the human reasoning process, which
contexts, which can limit the approach of the theory behind relates objects and abstract concepts and, from the knowledge
XAI methods. acquired, creates association rules for generalizing when
In this section, we have presented valuable research exposed to new settings.
reviewing various elements of XAI. Despite the rich literature, An advantage of symbolic Artificial Intelligence is self-
there is room for improvements in the latest studies. The explainability, i.e., it is interpretable, enabling the extraction
literature on explainability is no longer in its early days; of explanation elements about the rational process, leading to
however, XAI is not yet a mature research field [32]. Machine the model’s decisions [4]. A serious limitation of this paradigm
Learning applications have evolved quickly in the last few is the need to define all necessary knowledge explicitly.
years, pushing the need for more transparency. The early Furthermore, representational elements must be formalized
publications reviewing XAI have aged, since XAI has also manually instead of being acquired from data [58]. Such a
grown fast. limitation makes the development of symbolic models a costly
Therefore, we carefully reviewed the current literature to process, generally resulting in domain-specific systems, i.e.,
identify gaps in definitions of leading approaches to filling with low generalizability, which makes symbolic Artificial
them. In light of almost a decade of research on XAI, proposals Intelligence currently considered obsolete.
of more concrete terminology are required. This study In contrast, the connectionist paradigm emerged in
provides a theoretical foundation that differentiates the main 1959 with the concept of Machine Learning and Arthur Samuel
concepts of XAI and supports the reader with a clear set of defining it as ‘‘a field of study that gives computers the
XAI definitions, challenges, goals, categorization, evaluation, ability to learn without being explicitly programmed’’ [59].
and limitations. We do not perform a quantitative review, Machine Learning focuses on computational methods that
since previous studies have conducted them. In addition can acquire new knowledge, new skills, and ways of
to conceptual discussions, we propose a deep and highly organizing existing knowledge [60]. It is formally defined as
detailed analysis of the most recent and relevant approaches, a collection of techniques that enables computers to automate
especially feature importance/attribution methods such as the construction and programming of modeling by discovering
LIME and SHAP, providing a comprehensive review of theory and generalizing statistically significant patterns in available
and practical applications of explainability to beginners and data [61]. In other words, machines learn tasks based on
experts researching XAI. training models generated through data or previous experience
and adapt themselves to new inputs to make predictions in
human-like tasks. This is one of the main reasons why machine
IV. A BRIEF OVERVIEW OF MACHINE LEARNING learning is widely employed across different domains.
Artificial Intelligence and Machine Learning are two closely According to the aforementioned definitions, every
related research fields often associated with the development Machine Learning model is Artificial Intelligence – however,
of intelligent computing systems. Despite a significant the latter covers a broader scope of techniques, i.e., not
symbiosis between technologies and methods to the point the every Artificial Intelligence application belongs to the set
terms are sometimes used as synonyms, there are significant of Machine Learning models. Although research on Machine
conceptual differences between them. Learning algorithms started several decades ago, much of its
Artificial Intelligence is a multidisciplinary area of science impacting contributions have been relatively recent owing to
with applications in many theoretical and practical domains. the intense development of new algorithms, especially after
It focuses mainly on the development of systems that process 2012. One of the reasons for the recent boom in Machine
data or information from the environment to which they are Learning development is the advent of high-performance
applied and, based on such environmental perception, execute computing technologies such as GPUs, which enable modern
a set of automated actions that best fit the previous knowledge models to learn on large datasets at reasonable times.
toward achieving desired results [56]. In this sense, the so- This study focuses on the application of XAI to supervised
called ‘‘intelligent systems’’ are those that can make decisions learning, a training paradigm in which the dataset comprises
based on their judgment, similarly to the rationalization a set of inputs and a known mapping of each input to a
process for decision-making in human thought. desired output. More specifically, in supervised learning, the
FIGURE 1. Flowchart of multiple tasks tied to the development and implementation of a Machine Learning modeling.
model parameters are adjusted to produce outputs based on model processes new data. Moreover, the model can memorize
a training process that uses input patterns coupled with their the entire training set to increase accuracy, thus leading to an
desired outputs [11]. The dataset is a pair (X, Y), where unfair performance rate. Since a good predictive performance
X follows the same definition provided in Equation 1 and is expected for new data, i.e., the ability to generalize, a portion
Y = {y1 , y2 , . . . , ym } is a set defining the respective mappings of the data is often excluded and reserved for assessments of
of each input xk ∈ X, where yk ∈ Rc for each k. the model’s performance (the test set).
Supervised learning currently concentrates most of Regarding the nature of the labels for each input in
the advances on machine learning [7], hence, on XAI the dataset, a machine-learning task can be categorized as
approaches [32] and can be understood as the training of a Regression or Classification. The former trains a function in
generalized mapping based on previous data. The supervised which the output is a float, i.e., a real-valued vector. In contrast,
learning process aims to determine a mathematical model for classification problems, the model task is to learn a
that minimizes a loss function applied to the difference (or function from the inputs to a finite set, i.e., each input is
divergence) between all model’s predicted values and real mapped to one of a set of finite possibilities. Currently, there
values [61]. Therefore, the loss function quantifies the extent are several learning algorithms with different specificities
to which a prediction is based on the real value for a given and performance abilities, ranging from simpler and more
data instance, i.e., the more accurate the model predictions, interpretable ones to those more complex and not directly
the smaller the loss function results. interpretable (black boxes).
Figure 1 illustrates the steps of learning-based modeling Despite our brief introduction to the supervised learning
from the definitions of the problem to be addressed to the development pipeline, we assume the reader is experienced
validation and implementation of the model. Supervised in Machine Learning. Therefore, no specific model will be
learning depends on the historical data available and known deeply reviewed here, and interested readers can consult
(previously processed and labeled) in terms of quantity and LeCun et al. [62], Asimov [8], Chen and Guestrin [63],
quality. Indeed, any Machine Learning application depends [64], and Ghojogh and Crowley [65] for a detailed overview
on how real-world problems are defined, with data collection of some learning models. Due to the remarkable results of
and pre-processing steps representing factors that significantly current machine learning models and their growing use over
affect both accuracy and efficiency of models in capturing data the past few years, what elements make such sophisticated
patterns. However, the generated model does not represent the models non-interpretable black boxes? The answer is tied to
real world, but rather, only the reality of the data [61]. complexity, whose meaning and the reason why it is a critical
In general, the data on which the model was trained are element in reducing the transparency of machine learning
not the same used during the evaluation step since the set applications demanding explainability are clarified in the next
of rules learned in training might not be the same when the section.
V. WHAT DOES COMPLEXITY MEAN? used in training [69]. However, accuracy is not maximized
Several metrics define and evaluate complexity in computer by simply learning the characteristics of the training data as
science. Computational Complexity Theory is a research precisely as possible [68]. The reason for the loss in accuracy
field of theoretical computing with comprehensive literature is overfitting, which is a severe problem in Machine Learning.
on the characterization and classification of computational It occurs when the learning function of a model learns (fits)
problems. In the short term, simple and tractable problems the training data features overly well, which may lead to the
can be optimized and solved more efficiently than the generation of less effective (or very incorrect) predictors when
existing solutions. However, complex problems require an applied to unknown data.
in-depth investigation of the application domain toward the A model overfitting the training data tends to capture noise
creation of new tools for solving (or approximating) tasks that and random aspects of the sampling data (which will not
require significant resources. Moreover, some problems lack be repeated) as regular elements, therefore, missing broader
solutions and others are so complex that, although they can patterns. On the other hand, when the model is trained in an
be theoretically solved, the solution is unfeasible with current overly generalistic way (underfitting), the adjustment to new
resources, which are intractable. data tends to consider fewer random effects, but at the cost
Algorithmic complexity is a well-known research area of ignoring regular components [64], [68]. The training fit
related to the definition of the computational time at which a must be balanced so that the model can learn the true patterns,
particular algorithm solves a certain problem. However, this ignore noise, and minimize the estimation error.
study does not focus on time as a measure of complexity. According to Neal et al. [70], a possible way to measure the
Therefore, for a more detailed discussion of the fundamentals quality of a predictor model is to quantify its total expected
of algorithmic complexity and analysis, we refer to [66] error by the following expression:
and [67]. h i
The notion of complexity has been used several times so Err(x) = E (Y − fˆ (x))2 (2)
far and will be addressed in several other subjects through where the difference is squared (for symmetry) for calculating
this reading, thus raising the need for some clarifications. the mean squared error. The total expected error can then be
Specifically for the context between machine learning decomposed into the following three components:
algorithms and XAI methods, which criteria specify what is
considered simple and what is complex? The answer depends Err(x) = Ebias + Evariance + Enoise (3)
on the modeling context under analysis.
where Enoise term represents the intrinsic error independent
of the predictor model, Ebias denotes the bias term estimated
A. BIAS AND VARIANCE with Ex [(E[fˆ (x)] − E[y|x])2 ], and Evariance is the expected
Bias and variance are two related topics of extensive debate in variance of the output predictions estimated with Ex Var(fˆ (x)).
Machine Learning; therefore, complexity in learning models The complete proof of that decomposition is well-known and
cannot be addressed without a discussion on both subjects. detailed in Hastie et al. [71], Goodfellow et al. [72], and
Let us consider a typical supervised learning task for Ghojogh and Crowley [64]. Note the total approximation error
predicting (estimating) variable y ∈ Y from an n-dimensional of a predictor is in the function of bias and variance, in addition
input x ∈ X. There is a function f that captures the true to an intangible component tied with the noise of the true
relationships between both variables, i.e., y = f (x) + ϵ, with ϵ relationship among the predictive variables (ϵ) that cannot be
as a part of y that cannot be estimated from x. In this context, fundamentally reduced by any model [70], [73]. Therefore, the
the learning task objective is to determine an estimator model estimator should simultaneously have low bias and variance
fˆ that approximates the behavior of f , with the estimator (usually hard to achieve) so that the total error is as small as
describing the relationship between input (explanatory or possible.
predictive attributes) and output (dependent or objective In other words, a new model fˆ is generated at each training
variables). A good estimator yields a result as close as possible process iteration and, owing to data randomness, a variety of
to the true process that generated the data, which are initially predictions is obtained. Bias is the error inherent to the model
unknown. In an ideal scenario, the estimator model is trained and reflects the extent to which predictions are far from the
on unlimited data until the predictive patterns can be learned objective class. An error due to bias arises from the difference
so well that the estimation error tends toward zero. between the expected prediction of the estimator model (or
However, in the real world, we work with training sets of mean) and the correct value of the predicted variable. Variance
limited size and the generative processes of every data source captures how predictions deviate from each other. The error
involve a combination of regular (repeatable) and stochastic due to variance can be considered the estimator’s sensitivity
components [68]. The objective of a learning model is to to small fluctuations as a function of an independent data
train an estimator on the available data in order to acquire the sample [64], [73].
ability to adapt and generalize, generalize, hence, maximize Geman et al. [74] verified the inconsistency in convergence
the accuracy for future predictions when the model handles between bias and variance, claiming the cost of reducing
new and unknown data. Technically, ‘‘new data’’ are those not one of them increases the other. Therefore, a predictor must
whose amount and level of non-linearity are too high is require more sophisticated solutions to uncover their patterns.
virtually impossible [17], [63]. Therefore, the models named here ‘‘complex’’ are developed
Algorithms derived from the symbolic paradigm are, by def- to fit such a sophisticated relationship toward solutions in
inition, transparent (in theory) and require no explanation – contexts that would hardly be solved by less complex tools.
even if fully transparent, symbolic algorithms have limited The additional sophistication of modern learning algorithms
scope and generalizability. Some learning models, such as comes at the expense of their opacity. Whereas a linear
linear ones, decision trees, rule sets, and Fuzzy Systems, transformation can be interpreted by checking the weights
are traditionally considered transparent and interpretable associated with the input variables, multiple layers with
(white-box models) [13], [79], [80]. Linear models are non-linear interactions inside and among each layer produce
simple, efficient (in specific applications), and interpretable complex structures of difficult comprehension, requiring
approaches, since their parameters do not have non-linear proper tools to obtain explanations for their results [4].
relationships. However, this is a simplistic and questionable Even linear models considered fully transparent solutions
view, for the interpretation of even a linear model may be suffer from low accuracy compared with more sophisticated
challenging. Observe the following example: and accurate non-linear approaches. Arrieta et al. [2]
observed some exceptional cases in which the data under
y
modeling are ‘‘well structured.’’ In those circumstances,
= 0.33x1 + 2.5x2 + 18.2x3 − 4.81x4 + 7.6x5 + 1.83x6 simple and accurate models can be trained. Those who
+ 43x7 − 9.1x8 + 0.15x9 + 0.01x10 − 6.4x11 + 3.6x12 develop real machine-learning applications are not expected
+ 2.4x13 + 2.6x14 − 6.3x15 − 1.9x16 + 4.8x17 + 0.25x18 to continuously operate using controlled and high-quality data;
therefore, complex models are more advantageous due to their
+ 6.7x19 − 4.2x20 + 2.1x21 + 5.3x22 + 9.01x23 − 1.8x24
high approximation flexibility
+ 8.5x25 +4.4x26 +7.6x27 − 1.1x28 − 0.99x29 + 7.8x30 . Simple learning models do not compete with complex
(4) ones in terms of predictive performance and generalizability
capacity in multiple domains [32]. Once the interpretability
Although such a linear model with 30 parameters can be challenge placed by complexity has been understood, tools
considered simple from a mathematical perspective, is it easily can be designed to explain the opaque outputs of black-box
interpretable? Inspection complexity increases as the number models. The design process starts with the understanding of
of parameters increases. However, a coefficient is assigned to the goals and requirements of explainability.
each feature xi that linearly describes how each feature affects
the model’s output, i.e., the effect of each variable can be
extracted in a linear model by statistical and graphical support VI. XAI GOALS
methods such as Friedman H-statistic [81], Partial Dependence According to behavioral economist and Nobel laureate in
plots [82], and Individual Conditional Expectation plots [83], economics Daniel Kahneman, wherever human judgment
[84]. Even among numerous parameters, only a few may exists, there will be noise [85]. In other words, humans are
significantly affect the prediction output, making the model susceptible to noise and diverse biases when making choices –
inspection task more manageable. e.g., two professional financial analysts can elaborate contrary
If we can generate predictions and extract information market forecasts, judges can impose different sentences for
from simplified models (e.g., linear ones), why should we a same crime, and doctors can make distinct diagnoses for
need to employ complex (non-linear) models, such as Neural patients with a same problem. Which elements have influenced
Networks or Ensembles? The answer does not come from the those decisions – the weather, the weekday, or the moment
developers’ desires to propose sophisticated models, but from they were taken? According to Kahneman et al. [85], those
the data – more precisely, from the need to discover hidden elements are examples of noise that can lead to variability in
patterns in complex data. judgments that should be similar.
The current sources of information are large multidimen- However, human actions can be confronted in order to
sional datasets with arbitrary attribute amounts. Many of discover the reasoning process that guides a person to make a
such attributes can have correlational relationships, following particular decision and then identify the set of variables that
more different non-linear ratios (quadratic, exponential, are essential and what is only noise. Let us imagine a decision
among other complexities). However, a linear model cannot was based on a complex machine-learning model. In many
accurately map patterns hidden under non-linear interactions. scenarios, noise can have harmful effects that should not be
When working on complex data with non-linear dependence ignored. However, how are the processes or logical reasons of
relationships previously unknown by analysts, simpler models a ‘‘black-box’’ model interpreted? How can a model’s main
have deficiencies in their generalization abilities to ‘‘unfold’’ influence be explained?
the intricate correlations among variables. Despite the significant advances in both definition and
Non-linear algorithms, such as Neural Networks and construction of learning algorithms, explainability remains
Random Forests, can efficiently map non-linear relationships. a relatively new research topic. The Explainable Artificial
Data with complex interactions are naturally expected to Intelligence community has been active, promoting many
itself to express information [2], [91]. Similarly, a transparent are intrinsically interpretable [94], [95], but with no further
model can be interpreted, i.e., a model with some degree verification of the interpretability of the algorithms [42].
of interpretability [2], [13]. In general, final users are not Munroe [96] addressed this subject using sarcasm. Machine-
equipped to understand how data and code interact to make learning applications are constructed by mathematical
decisions affecting them individually. In this sense, the modeling tools derived from linear algebra and calculus. From
transparency concept includes various efforts to provide the user’s viewpoint, such complex models are black boxes in
practitioners, especially end-users, with relevant information which input data enter on one side and answers are collected
on how a decision model works [90]. Finally, trust is also a on the other. However, what is the solution to ‘‘incorrect’’
term with a subjective meaning commonly associated with a results, i.e., those results that do not meet performance criteria
psychological state of security that, in the Machine Learning or statistical metrics? The mathematical tools may be adjusted
context, has often been expressed through the models’ good by hyperparameter optimizations until the answers begin to
predictive performance (evaluated by performance metrics). appear correct.
However, this study shows this is a simplistic perspective, Real research on Machine Learning goes beyond simple
since more trust criteria must be considered. adjustments in models. Despite such a satirical view of
When used in isolation, traditional performance metrics can the learning modeling algorithm-centric process, a matter
lead to misleading evaluations. We will not open a discussion of significant importance for the establishment of trust
analyzing the multiple performance metrics for machine in machine learning applications that have been in the
learning applications available in the literature, since it is out background must be considered, i.e., results that ‘‘look
of this study’s scope, although an in-depth understanding of correct.’’
metrics is recommended for any machine learning practitioner. Assuming a learning algorithm is intrinsically interpretable
The reader can find detailed descriptions of the metrics used is not always an incorrect or problematic view. In some
for evaluating learning algorithms in Gareth et al. [92] and the cases, interpretability may not be necessary (e.g., when
drawbacks of some performance metrics in Batista et al. [93]. the algorithmic decisions are not leading to significant
An accurate measurement of a model’s prediction error is consequences that affect user safety, when there is no
essential for assessing its quality. The primary goal of machine possibility of generating injustices, or when the task solution is
learning modeling is to build models that make accurate generally well-known and was already sufficiently tested) [21],
predictions of the target value of new data (data not used in [42]. However, the range of decisions made by intelligent
training). A performance metric should reflect the modeling systems based on machine learning increases daily and is
objectives; however, instead of reporting the model error on no longer restricted to academic and research environments.
new data, traditional metrics are often applied to a test dataset, Handling incorrect results requires understanding the source
which is a portion of the data split from the training dataset of the errors and not only their relation to model adjustments.
– although assessment and recurrent mechanisms consider Errors can sometimes be related to biases in the training
new or residual data to check the quality of predictions and data learned by the model. In this case, adjusting the model
adjust the model ‘‘on the fly,’’ if necessary. Any model is will improve the metrics’ results, but may hide biases.
naturally optimized to describe the data on which it was trained. Understanding the source of errors in black-box applications
In this sense, the information generated by the methodology goes beyond assuming interpretability is unnecessary because
typically used for error measurement in learning models can a model has high accuracy rates. Trained models can hide
be misleading, resulting in the selection of inaccurate and biases, requiring the exploration of their reasoning.
inferior models [69]. Deep Learning and ensemble learning models have intricate
According to Amann et al. [16], transparency is one and complex internal mechanisms that are virtually impossible
of the main requirements for the establishment of trust in to interpret. Moreover, the reasons leading to a decision cannot
intelligent systems. Therefore, regarding applications based on be understood, thus obscuring verification tasks that try to
complex black-box models, efforts should be made to include assess the logic behind predictions [97]. Opaque models are
transparency, with explainability proposing tools for achieving black boxes in a setup where input data enter one side and
transparency. Other frequent terms encountered in the XAI predictions are output on the other side, with the processing
literature are derivations in context or semantic meaning of details remaining obscure or unknown. Black box components
those addressed and clarified in this section, which are the do not clarify their reasoning, hampering the understanding
core terms in XAI vocabulary. The reader can find helpful of the way they achieve a given result [4]. The top part in
definitions for other such terms in Arrieta et al. [4], Adadi and Figure 5 illustrates a typical supervised learning application.
Berrada [2], and Bhatt et al. [90]. In the following subsections, Each learning model has its own capacity and each data context
we describe additional important concepts of XAI theory. may demand different capabilities; therefore, different models
can have distinct accuracies in a same dataset. In this context,
performance metrics guide data scientists in selecting the most
A. XAI NEEDS AND CHALLENGES accurate model for each application.
The Machine Learning literature has been ‘‘algorithm- After training, testing, and achieving predetermined
centric,’’ assuming the approaches and models developed accuracy requirements, the learning model can be deployed to
FIGURE 5. Explainability is positioned as a complement to Machine Learning. Complex models work as black boxes and XAI
tools explain black-box decisions in interpretable terms, enabling practitioners to make decisions based on more transparent
information.
classify unlabeled data. At this moment, it will be on unknown domains [7], [8], [100], their understanding has been
ground and apply all the knowledge acquired (learned patterns) primarily ignored [101]. Some studies have demonstrated the
during the training-testing procedure to the new input data. weaknesses of high-end models, proving not everything is
Model monitoring is challenging due to the significant perfect, even when Machine Learning can reach high accuracy
diversity in the relationship modes between data and mapped rates. As examples, ethnic biases were detected in a model
spaces [14]. It is essential to verify whether an operative model that predicted criminal recidivism, software used by Amazon
classifies new data correctly or if its performance is considered excluded ethnic minorities while determining areas in the
satisfactory owing to some training bias or problem definition. United States that would receive discount offers [26], and
However, in practice, when running on unlabeled data, the a model trained to predict the probability of death from
model can generate incorrect classifications or classifications pneumonia assigned lower risk to patients with asthma [102].
based on incorrect reasons and, consequently, problems in Szegedy et al. [103] demonstrated how a powerful deep
machine-learning applications, such as spurious noise or bias, Convolutional Neural Network (CNN) could be manipulated
can remain hidden in the decision-making process. to misclassify pictures of school buses as being ostriches by
The internal elements of machine learning models are simply introducing a visually imperceptible noise to the test
commonly illustrated graphically, although the models are images. Goodfellow et al. [104] discussed the susceptibility
essentially mathematical functions. Such a representation of Artificial Neural Networks to adversarial attacks, a type
is often used because the reading of a complete function of attack used to discover minimal changes to be made to
of a model (e.g., Neural Network) can be problematic, and input data for ‘‘foolling’’ the network and causing wrong
visual representation as a network of neurons enhances classifications. Su et al. [105] analyzed the extreme case of a
readability [65], [98], [99]. Despite graphical representations one-pixel attack and Haim et al. [75] showed sensitive training
supporting explanations of models’ architectural elements, data encoded in the parameters of a trained classifier could be
understanding how learning models work is difficult and recovered in a training-data reconstruction attack.
discovering how complex non-linear functions transform data A critical problem that may remain unnoticed by commonly
is an overwhelming task. used evaluation metrics is the generalization ability of learning
However, the assurance of trust in intelligent-based methods. Lapuschkin et al. [106] developed interesting
applications requires more than the supply of results that research demonstrating cases whose model’s predictions
appear correct – they must be both correct and fair. That is were based on spurious correlations unrelated to the
when XAI can be applied, generating explanations that can learning objectives, known as Clever Hans phenomenon.
check whether model predictions are correct for correct If complex models provide difficult-to-interpret decisions in
reasons, thus providing compliance guarantees justifying black sensitive contexts that can affect people’s lives (e.g., credit
boxes’ decision-making processes. The bottom of Figure 5 scores, public administration, and medicine), the interactions
illustrates such a scenario. between attributes that provide predictive accuracy must be
Although novel learning architectures are constantly understood [17]. Once the reasons behind those decisions
developed toward better performances in most different have been comprehended, it is possible to verify whether
FIGURE 6. Explanation of the influence features in a prediction diagnosis of flu. Symptoms that contribute to the result are
highlighted in green and those that do not contribute are highlighted in red.
the model results are reliable or based on spurious biases or ‘‘Opening’’ the intricate black boxes that modern machine
noise. learning systems have become is not a trivial task. State-of-the-
Spurious biases can be incorporated into the knowledge art models are black boxes difficult to understand [42]. Even
space of learning models in more diverse ways, leading linear models, which are simpler and more transparent, are
to unfair decisions. Systematic biases in training datasets fully transparent only in limited contexts, i.e., explanations of
(socially constructed biases, inaccuracies, and errors in data high-dimensional linear models are challenging [26]. In some
collection [2]), problems or errors with modeling definitions cases, and contrarily to expectations, models considered
and algorithms, limitations in training, or lack of evaluation transparent and of effortless understanding may decrease
are typical sources of biases. Once incorporated, hidden biases users’ chances of error detection owing to high amounts of
are difficult to detect and treat in complex models. However, information [108].
explainability can be applied to support bias detection by Omitting explainability can lead to challenges for model
providing insights into the models’ decision processes. Zhang trust. However, toward properly including explainability,
and Zhu [9] and Ras et al. [107] investigated the adoption of the generated explanations must be helpful, understandable,
XAI approaches for bias detection. valid, accurate, and consistent to be useful [25], [109]. Only
XAI can support information-based decisions. Let us because an explanation ‘‘makes sense’’ does not imply it
consider a medical diagnosis support system based on a is automatically valid [110]. One of the most significant
classification model that predicts a patient’s flu according to challenges within the XAI environment is to provide reliable
his/her history of symptoms or is verified by the hospital team. explanations supported by robust validations [16].
The application explaining which symptoms have influenced
the decision apart from prediction would be helpful and very B. XAI CATEGORIZATION OF APPROACHES
informative to doctors so that they could have a better basis The first initiative for the generation of explanations for
for diagnosis instead of simply making a decision based on artificial intelligence models dates back to the 1980s, when
automatic results [97]. Figure 6 illustrates the situation and researchers introduced questions about the negligence of
the importance of adequate explanations. symbolic systems regarding explanations [111]. However,
In addition to the need for decision-making processes the concept of XAI has been consolidated recently owing
supported by information, regulations such as the European to the growing need to explain the results of complex learning
Union’s implementation of GDPR (discussed in Section I-A) models [2].
also require intelligent-based applications observe ethical The XAI research community has been devoted to the
matters. However, the regulation is not sufficiently detailed creation of multiple new methodologies and approaches
and does not define the tools or requirements to be designed that explain specific models or even those independent of
or made available for ensuring compliance with the law. any model. XAI techniques can be classified according to
In such a context, explainability can verify whether the their functional characteristics and objectives and black-box
model is sufficiently fair in its decisions, mainly when problems can be ‘‘open’’ in two ways, namely, (i) by
training data include biased or incomplete cuttings [108]. constructing transparent systems accurately enough to replace
Correlation does not imply causality; however, causality black-box ones or working with them in a support or
involves correlation. In this sense, explainability can help redundancy mode [20], [112] or (ii) through post-hoc
validate predicted outcomes by revealing possible correlation explainability. Figure 7 shows a diagram of the XAI approach.
relationships related to specific outcomes [2]. Explaining It is an intuitive approach to replace a black-box model with
learning-based applications can promote the discovery of a transparent one; however, many Machine Learning tasks
potential failures, helping data scientists identify causes of are so complex that they cannot be solved by constructing
errors more efficiently and indicate what the model has really an accurate, transparent model. In those cases, there is the
learned from the data [90]. need for explainability through post-hoc methods designed to
FIGURE 7. Taxonomy of explainability approaches regarding the different understanding objectives over a black-box
problem. Explanations can be derived by approximating black boxes through transparent models or post-hoc
methods.
explain models that are not directly interpretable and cannot They separate predictions from explanations and search for
be efficiently approximated by a transparent model [2]. Post- explanatory elements without entering the classifier’s internal
hoc approaches are applied after a black-box model has been logic [4].
trained and used to explain its predictions based on tools Ribeiro et al. [97] argued model-agnostic techniques are
that improve the understanding of black-box applications. more valuable because they provide explanations for any
In addition, post-hoc explanations are typically not intended to algorithm, enabling different models to be compared by a same
unravel the way a learning model works internally, but rather, explainability technique. Conversely, Chen et al. [113] claimed
to provide helpful information to users and data scientists, such techniques rely excessively on modeling a posteriori of
such as importance of specific modeling parameters or data arbitrary learning functions and, consequently, may suffer
attributes [26]. from sampling variability when applied to models with many
Based on Figure 7, the following subsections categorize the input attributes, which can hamper convergence among results.
XAI approaches according to the underlying learning model, In this sense, whether model-specific or model-agnostic,
data granularity, and explanation tasks. a more suitable choice will depend on the explainability
task. A model-specific method can be employed if learning
1) ACCORDING TO THE MODEL models from different classes do not require comparisons and
Post-hoc techniques can be classified according to the type a superior model-specific method is available for the model
of model they are designed for. XAI methods created for a under explanation. Otherwise, a model-agnostic method will
specific class of models are indicated when the explainability be more flexible.
objective is to unravel the logic of specific classifiers However, series of models is a type of architecture that
or when an advantage can be taken from the model’s has imposed significant challenges on transparency. When
architecture. Model-independent methods, or model-agnostic, the outputs of a predictive model are used as inputs for
can (theoretically) be applied to any learning model, regardless another predictive model, we have a series of models
of the underlying architecture or algorithm, because they are architecture [114], which are complex pipelines composed
not designed to consider any characteristics of a specific of different types of black boxes, such as linear, tree, and deep
model. models [115], [116]. Such a design hampers explainability
Model-specific methods may perform better, for they in comparison to a single model. As an example, different
explore the functional specificities of the model’s class proprietary models are distributed across different institutions
under explanation. However, they have limited application in consumer scoring tasks. Each pipeline branch has thousands
capabilities – since they are planned to work with a singular of data segments about consumers for simulating distinct
class of learning models, they may not be flexible enough elements related to consumer scores (fraud scores, credit
to work with any other type of model out of the class they scores, and health risk scores, among others).
were designed for [4]. In contrast, model-agnostic methods Multiple high-stake applications use series of models, which
aim to understand the reasoning behind predictions using demand approaches designed to explain such an architecture
simplifications, relevance estimates, or visualizations [2]. as crucial in XAI research due to the lack of transparency
of this complex structure and the need for debugging and The capture of an overview of all learned mapping can be
building trust in applications based on series of models [114]. difficult or less informative, especially in models with high
A natural solution might be to apply model-agnostic methods numbers of attributes, since it demands the explanation method
for explaining the entire pipeline of a series of models at once. to discover an optimum in detecting any functional dependence
Although standard model-agnostic techniques can explain a between all input data and targets, which can be an NP-hard
series of models, they do not work appropriately because problem in general [121]. Users must trust the prediction to
model-agnostic methods suffer from some shortcomings in make a decision based on it. Apart from evaluation metrics,
this context, i.e., they require access to every model in predictions must be tested individually by local explainability
the series (but institutions sometimes cannot share their for justifying them, particularly when the consequences of an
proprietary models) and have a high computational cost, which action can be catastrophic (e.g., incorrect medical diagnosis
may not be tractable for large pipelines [114]. or counter-terrorism).
Standard model-specific XAI methods are often faster than Individual explanations can justify which input features
model-agnostic alternatives. However, they cannot explain of a data instance lead to a specific decision when a global
series of models, since model-specific methods are designed view of the model is not sufficiently descriptive [4]. However,
to operate by considering one type of black box, and a for graph-based learning models, sometimes the goal of local
series of models may comprise many types of predictive explanations is not limited to explaining input features of a
models in its pipeline. Series of models have been little specific node; rather, it can be more valuable to explain which
explored in XAI literature. The explainability solutions nodes in a neighborhood were most important for a decision.
for such models demand hybrid model-specific XAI tools According to Ribeiro et al. [97], to be meaningful,
that can handle distributed pipelines but, simultaneously, an explanation must maintain local accuracy, i.e., the
sufficiently generalist to manage the diversity of the model’s correspondence between the explanation and the behavior
compositions [114]. of the model in the neighborhood of the predicted instance.
On the other hand, the authors also claimed local accuracy
2) ACCORDING TO GRANULARITY does not imply global fidelity simultaneously, since globally
Another categorization in XAI is related to the granularity of important characteristics may not be locally important, and
explanations. According to Ribeiro et al. [97], there are two vice versa. An utterly faithful global explanation cannot
main classes of methods with respect to granularity, namely, be obtained without a complete description of the entire
model. As reported by Wojtas and Chen [121], a simple
• Global: Methods designed for global explainability collection of instance explanations may not work at the
are applied when the task is to obtain an overview of population level characterization because local explanations
the model’s behavior regarding the more influential are specific to the instance level and often inconsistent with
elements [117]. Global methods provide a summarized global explanations. Therefore, identifying globally faithful
description of a model’s behavior when the scope is the and interpretable explanations remains a challenge [97].
entire dataset (or a significant cut) [118]. The strategy
is commonly used to compare the global relevance of 3) ACCORDING TO THE EXPLANATION TASK
a variable and understand which other variables are Some XAI methods are designed to understand learning
more relevant in population-level decisions (population- models’ structures and internal mechanisms, i.e., model
wise). As an example, estimating global behavior in explanation methods. Such a category of methods is typically
settings such as climate change or drug consumption found in Neural Network applications, in which information
trends is more valuable than providing explanations for visualization is applied to generate visual representations of
all possible modeling elements [4], [119]. the internal patterns of neural units. However, contrarily to
• Local: Methods devoted to local explainability are intuition, Poursabzi-Sangdeh et al. [108] indicated exposing
indicated when the task is to retain a more accurate the internal mechanisms of a learning model reduces the
description of the details connected to a single-instance users’ ability to detect faulty behaviors for unusual instances.
prediction (instance-wise) [120]. At a high level of Amann et al. [16] claimed such an interpretability reduction
knowledge, the goal is to understand the motivation for might be related to the overhead induced by the large amount
a specific prediction. Local explanations are valuable for of information users are exposed to during the understanding
complex models with different behaviors when exposed process, even in transparent models. Note the findings of
to different combinations of input variables [117]. Poursabzi-Sangdeh et al. [108] do not invalidate model
The user must trust the model will exhibit an appropriate explanation methods, but rather alert developers to design
behavior when deployed. In the modeling stage, evaluation tools that synthesize large amounts of information carefully.
metrics were applied to the model generated from the training Model inspection is used when the explanation task is
data (validation process) for emulating real-world behaviors. to verify the model’s sensitivity, i.e., the behavior of the
However, data content and the real world significantly differ. learning algorithm or its predictions when the input data are
Global explainability can investigate whether a model reflects varied through perturbations. On the other hand, prediction
its modeling expected behavior. explanation methods display visual or textual elements
that provide a qualitative/quantitative understanding of the defined an ‘‘XAI question bank’’ as a set of how-, why-,
relationship between the input variables and a prediction for and what-based questions users might ask about Machine
clarifying the factors that influence the model’s final decision. Learning for guiding XAI developers’ good design practices.
According to Ribeiro et al. [97], prediction explanation Bhatt et al. [90] identified explainability needs according to
methods promote trust between users and learning applications the audience and developed a framework with a set of goals
faithfully and intelligibly. The explanation of predictions does for explainability to facilitate end-user interaction.
not require all the classifier’s internal logic to be unraveled. Other significant challenges in XAI beyond the existing
Moreover, such methods should explain individual predictions technical challenges in explaining complex models should
of a complex model, regardless of whether it is correct or not. be highlighted. Important principles that must always
Prediction explanation is one of the leading research areas guide any Artificial Intelligence system development and
in XAI, with multiple techniques devoted to identifying and implementation (e.g., security, privacy, and data protection
quantifying the contribution of input elements to predictions guarantees) must also be included in explainability approaches.
of complex models [4], [33]. Regarding GDPR again (see Section I-A), algorithmic
Explanations can be provided by a global method or a local decisions ‘‘which produce legal effects concerning (a citizen)
attribution one that assigns some measure of importance to or of similar importance shall not be based on the data
each input datum in both granularity, i.e., for a collection revealing sensitive information, for example about ethnic
of instances or a set of input attributes of a specific data origins, political opinions, sexual orientation’’ [25]. In other
instance. Finally, the output of a learning model can be words, explaining a prediction does not mean disclosing and
interpreted through evidence-based (or factual) explanations. exposing sensitive data that should not be published [125].
In this context, contrastive and counterfactual methods seek However, defining what is sensitive data must comply with
justifications for why a decision was not different from that social principles like ethics and fairness. Fairness refers to
one predicted and how it can be modified, respectively [122]. the ability of learning models to make fair decisions with
no influence of hidden biases that might mistakenly affect
C. XAI AS A FAIR PLAY ELEMENT TO ARTIFICIAL (negatively or positively) them [126]. Intelligent computer
INTELLIGENCE applications must be impartial concerning social aspects such
Establishing trust is one of the main foundations of XAI. as religion, socioeconomic background, political opinions,
Therefore, it is unreasonable to use a black-box explainer or ethnic origins [127].
as a black box itself. According to GDPR, regarding the In addition to respecting those elements, XAI methods
right to explanation, ‘‘if a decision-support system provides should work toward improving learning-based applications
inconsistent explanations for similar instances (or the same and ensuring they accomplish their tasks accurately and
instance), those explanations provided to the user cannot be responsibly. Interested readers can find extensive discussions
trusted.’’ A suitable explainability method cannot provide on the needs and challenges in promoting responsible Artificial
(completely) different explanations when executed multiple Intelligence in Arrieta et al. [2] and Tjoa and Guan [42].
times to explain a same instance or similar ones. In other words,
an explainer must be consistent to be reliable [25]. Stability
is a critical requirement XAI methods must verify to ensure VIII. XAI APPROACHES AND METHODS
consistency of explanations, which must be constant [123]. In this section, we provide an overview of the most recent
The lack of consistency may lead to problems with the and relevant XAI methods by analyzing their characteristics
explanation’s general trustworthiness, thus questioning the and functionalities for generating explanations for black-box
entire purpose of explainability. problems. According to Molnar [40], the decision process
It is necessary to highlight that explanations are context- of a learning-based model can be interpreted analyzing the
dependent, i.e., it is impossible to define the questions an influence of each variable (attribute) on instance prediction.
XAI method will answer without considering the needs of the Such individual influences (or importance) can be easily
target audience to whom explanations are addressed [90], [97]. verified in a linear model f , formalized as follows for an
Domain experts and developers may be interested in auditing n-dimensional data instance, x = (x1 , . . . , xn ) ∈ X:
the behavior of models for discovering errors or vulnerabilities f (x) = ω0 + ω1 x1 + · · · + ωn xn (5)
hidden within complexity. Improving learning models or
prediction understanding would lead to the correction of where xi is each attribute value i of instance x, with ωi being
flaws in the application of the modeling [2]. End users and its associated weight and ωi xi describing the effect of each
regulatory agencies may demand logical and verifiable outputs variable (weight multiplied by attribute value). The influence
from decision-making systems that can affect them, clarify (φi ) i-th variable implies on prediction f (x) is calculated as
doubts, and ensure the observation of compliance criteria,
φi (f ) = ωi xi − E[ωi Xi ] = ωi xi − ωi E[Xi ] (6)
which are important for indicating a learning application is
trustworthy. Data scientists and corporate managers demand where E[Xi ] is the expected value of variable i. The
tools to verify whether their data are being transformed into contribution of each attribute to a prediction can be inferred
useful information for the right reasons. Liao et al. [124] according to the difference between its effect and the expected
value. Adding all variable influences of the instance leads to the more significantly their contribution to the predictive
n
X n
X performance.
φi (f ) = (ωi xi − E[ωi Xi ]) • Gradient-based: A particular case of the sensitivity
i=1 i=1 approach that assesses the behavior of the machine learn-
n
X n
X ing model through infinitesimal size perturbations [90].
= (ω0 + ωi xi ) − (ω0 + E[ωi Xi ]) • Feature Selection: Identifies a combination or a subset
i=1 i=1 of p important or most contributing features (from an
= f (x) − E[f (X)] (7) original dataset holding n features) that train a model
with the minimum loss of accuracy. In practice, p ≪ n
as the difference between instance x prediction and prediction for most feature selection tasks [129].
expected value.
Explaining other classes of machine-learning models in We distinguished feature selection from feature extraction.
such a simplified way would be interesting; however, the Both methodologies aim to improve the performance of the
concept of ‘‘effect’’ worked directly only in that case data-driven models by reducing the original feature space.
because of the model’s linearity. Even a moderate amount of However, feature extraction methods are more closely tied
non-linearity in relationships among attributes may increase to dimensionality reduction tasks, creating a set with new
the complexity of the linear modeling formulation, thus features from the original data through linear or non-linear
reducing its interpretability. In other words, even linear transformations that map a significant low-dimensional
models can be sufficiently complex to be interpretable [113]. representation from high-dimensional data while preserving
Feature effect/impact, variable contribution, and feature-level previously defined information [130], [131]. In contrast,
interpretation are terms often used in XAI literature to describe despite also aiming to reduce dimensionality, feature selection
how or to what extent each input feature contributes to the is performed by dropping data axes based on canonical
model’s prediction, i.e., feature importance [33], [84]. projections instead of learning mappings.
Breiman [17] proposed one of the first solutions for
identifying important features. His approach permutes each A. XAI BASED ON APPROXIMATIONS
feature (randomly shuffling the feature values) to assess its When data science tasks demand the application of sophisti-
individual contributions. More specifically, permuting feature cated and accurate models, their strong non-linearity results
values breaks the connection between the feature and the in a lack of transparency, requiring explainability approaches.
target variable, resulting in a significant loss of prediction Explainability can be achieved by intrinsically interpretable
performance if the feature is important. Therefore, the amount algorithms for approximating the predictive performance of
of performance deterioration indicates the extent to which the original black-box model. More specifically, a black-box
the model depends on that feature [84]. On the other hand, model can be used as a ‘‘trainer’’ to transfer knowledge to a
Breiman’s method is model-specific for trained Random more transparent and interpretable model that approximates
Forests. and explains the original predictor’s outputs, which is also
Despite the significant assortment of XAI methodologies, known as model distillation [132].
explaining predictions through feature-level interpretations is A well-known interpretability standard derives from
a common goal of XAI approaches [33] and several authors decision trees because the logical sequence of a decision tree
(including the ones of the present research) classify XAI can be intuitively interpreted by a human analyst [16]. Other
methods according to other elements or mechanisms applied to interpretable strategies include logistic regression and rule-
accomplish the task of feature importance. Feature attribution based learning [26], which have significant limitations. As an
is at the core of feature importance. However, as demonstrated example, decision trees tend to have low generalizability in
here, that is not the only way to verify importance. The main addition to being prone to overfitting, and logistic regression
terminology related to feature importance problems is defined assumes input data are linearly separable, which rarely
as follows: occurs in real-world situations. According to Tan et al. [33],
• Feature Attribution: Measures the contributions of indi- explanations based on decision trees have low accuracy, and,
vidual input features to the performance of a supervised in some cases, can be less accurate than those explanations
learning model, fairly distributing the predicted values based on linear models.
among the input variables for quantifying each variable’s Guidotti et al. [133] developed a solution through
relevance [84], [121]. rule-based classifiers using a genetic algorithm to sample the
• Additive Importance: An explanation modeling accord- neighborhood of a given instance, train a decision tree, and
ing to which the summation of all feature importances then generate an explanation. Although considered transparent,
should approximate the original predicted value [109]. rule-based methods have scalability limitations, similarly to
• Sensitivity: Measures how the predictive performance linear models. In some cases, generating a massive set of rules
of a learning model varies (increases or decreases) by is necessary for the obtaining of good classification levels,
perturbing each input feature [128]. From the perspective rendering the analysis unfeasible. Rule-based models are best
of sensitivity analysis, the more important the variables, suitable for approximations in reduced domains [134] and
simpler and transparent models are not sufficiently efficient in that simultaneously minimize the model’s performance and
handling high-dimensional data with complex relationships. perturbation. Chan et al. [118] developed a graphical interface
Lou et al. [135] presented an approach based on GAMs for the inspection of predictions from different density levels.
(Generalized Additive Models) as an interpretable alternative The system provides a summarized overview so that users
to complex regression models. GAMs are linear smoothing can ‘‘browse’’ from global to local explanations. The interface
models that decompose a predictive function into an shows the importance of an instance from different contexts
aggregation of one-dimensional components defined for each generated through groupings of similar instances into topic
predictive variable [136]. They can then capture the individual vectors of different granularities.
non-linear relationships of the variables under modeling. Partial Dependence plots [71], [82], [143] are graphical
Lou et al. [112] improved the previous version of methods used to understand supervised learning models by
Lou et al. [135] using a new and optimized mathematical visualizing the average marginal effect (partial dependence
formulation and Caruana et al. [20] chose a GAM-based values) between input variables and predictions [4]. Partial
approach because it is a ‘‘gold standard’’ in interpretability. dependence can capture monotonic relationships, but can
However, GAMs are limited by their low performance in also obscure heterogeneous effects and complex relationships
generating explanations for more complex modeling [2]. resulting from feature interactions [40]. Individual Conditional
Interpretable models can perform similarly to their non- Expectation [83] curves handle this problem by disaggregating
interpretable counterparts. Tan et al. [33] presented a the partial dependence output and visualizing the extent
comparative study indicating explanations generated by to which the prediction of an individual instance changes
distilled transparent models achieve accurate results in when the value of a selected feature changes [84]. Heatmaps
contexts of additive explanations. However, those models have been applied to highlight and explore the most relevant
build different representations of the latent space, which elements of neuronal units in image classification problems [9],
can affect the predictive performance [137]. Unfortunately, [106], [134]; however, they are difficult to aggregate, making
training an interpretable model as a best-fit approximation the visual detection of false positives at scale challenging [90].
that mimics complex black boxes is sometimes unfeasible. Xenopoulos et al. [144] developed GALE (Globally Assess-
According to Linardatos et al. [32], the construction of a ing Local Explanations), a TDA-based [145] methodology
competitive and transparent model in some domains, such as for extracting simplified representations from sets of local
language processing and computer vision, is very difficult explanations. The method generates a topological signature of
because the gap in performance compared to deep-based the relationships between the explanation space and the model
models is unbridgeable. Furthermore, knowledge transfer predictions. Based on a visual inspection of the representations,
between different domains is another limitation imposed on the parameters of the underlying local XAI method can
transparent models. be assessed and tuned or the similarities among different
Post-hoc explainability must be considered when approxi- explainability methods can be quantitatively compared. GALE
mation through transparent methods is not feasible [40]. The acts more as a visual assessment tool for well-known local
universe of post-hoc XAI is extensive in the literature. The XAI methods than as an explainability method.
following subsections present the main classes of approaches Cabrera et al. [146] demonstrated how to detect learning
created to convert black-box problems into explainable biases and assess fairness with an interactive visualization
information and the proposed classification extends the interface that enables investigations of similar subgroups
taxonomic diagram shown in Figure 7. and impacting features. However, it supports only binary
classification and tabular data and suffers from scalability.
Multiple XAI visualization-based methods actively use
B. XAI BASED ON INFORMATION VISUALIZATION dimensionality reduction techniques or multidimensional
Information visualization maps data into graphical formats, projections [131], [147] to generate interpretable repre-
simplifying their representation, hence, helping analysts sentations of the feature spaces of learning models, such
visually discover trends, patterns, and characteristics [138]. as relationships between neurons and their influence on
Visualization techniques have been long adopted in multiple data [141]. Cantareira et al. [148] developed a method
application domains, including XAI, where the visualization that describes the activation data flow in hidden layers
community has made considerable efforts to using graphics of Neural Networks. The information enables verifications
to provide interpretation from Neural Networks [139] to the of the network evolution during the training process and
recently introduced Deep Learning architectures based on representations are created when one layer transmits the
Transformers [140]. The use of visualization to explain the information to the next layer. Rauber et al. [98] introduced a
training process enables analysts to inspect model learning, similar method that visually explores the way artificial neurons
thus, monitoring its performance [141]. transform input data while they pass through the hidden layers
Marcílio-Jr et al. [142] designed a model-agnostic tool of deep networks.
based on coordinated views to visualize similarity between SUBPLEX [120], [149] is an interactive visual analytics
classes. It measures the importance of features using tool that connects multidimensional projections with local
optimization to induce perturbations in individual features explanations and aggregates large sets of local explanations
VOLUME 12, 2024 80817
E. S. Ortigossa et al.: XAI—From Theory to Methods and Applications
at the subpopulation level for reducing visual complexity. on optimization for retaining the necessary samples and
From aggregated explanations, users can interactively explore ensuring the correct delimitation of the decision boundaries.
explanation groups in detail using feature or instance Sohns et al. [154] designed an interactive interface to visualize
selections for identifying patterns and comparing local patterns the topology of decision boundaries and other graphic tools to
in multiple subpopulations. SUBPLEX did not introduce a explore partial dependence and feature importance. However,
new XAI approach or technique itself; instead, it applied well- as a visualization tool in complex domains, the approach has
known XAI techniques and projection methods to generate scalability limitations.
aggregated explanations in a graphical user interface. However, Yousefzadeh and O’Leary [99] calculated flip points, which
the tool requires computation for real-time processing of large are points close enough to the decision boundaries of trained
amounts of data. models such that they can be classified into both classes
UMAP [150] and t-SNE [130] are two robust mul- (considering Neural Networks with two outputs). The study
tidimensional projection techniques frequently used in introduced the following issues: flip points can be used to
visualization-based XAI methods [42]. However, dimen- determine the minor change in input data enough to modify a
sionality reduction techniques have usability limits in prediction; incorrectly classified data instances tend to have
terms of number of points visualized simultaneously [98], smaller distances from a flip point than correctly classified
[141]. Explainability through information visualization faces ones; points close to their flip points are more influential
scalability challenges related to dealing with large numbers than distant ones in determining decision boundaries between
of elements in addition to adequately describing their classes; and using flip points as synthetic data during model
relationships [147]. training can improve accuracy when the model is biased. Flip
points exist for any model, and not only for Neural Networks
and, if appropriately confirmed, the aforementioned issues can
C. XAI BASED ON DECISION BOUNDARIES turn flip points into key elements in providing interpretation
Explaining the behavior of machine-learning models by for checking trust in predictions [99].
investigating their decision boundaries is a little-explored
approach in the literature. Karimi et al. [101] developed
DeepDIG, a method based on the generation of adversarial D. XAI BASED ON CONTRASTIVE AND COUNTERFACTUAL
samples sufficiently close to the decision boundary, i.e., EXAMPLES
synthetic instances between two different classes. More The right-to-information regulations demand meaningful
specifically, DeepDIG works on a previously trained Deep information about the logic behind automatic decisions [155].
Neural Network (DNN) and generates border instance samples Although they do not define ‘‘meaningful,’’ it is reasonable to
with classification probabilities for two distinct classes as expect explanations will go further than technical aspects and
closely as possible, resulting in classification uncertainty. translate machine-learning decisions into human-descriptive
Those synthetic border instances are then used for measuring language. The goal of introducing a human-centric nature
the complexity or non-convexity of the decision boundary to XAI has motivated researchers to pay more attention
learned by the trained DNN. to social-based considerations on the requirements of
One of the main properties of DNNs is their remarkable explanations, with notions of contrastive and counterfactual
generalization power, achieved through sophisticated combina- argumentation arising as natural reasoning paths of humans
tions of non-linear transformations. As a result, DNNs can map for explaining why or how some decision was made or could
data with complicated and high-dimensional relationships, be made.
which introduces the question of whether the complexity of The concept of contrastivity is acquired from social sciences,
data in the input space is reflected in the transformed (learned) which establish human explanations derive essentially from
space of the network. contrastive processes [156]. Contrastive explanations clarify
Toward addressing that issue, Karimi et al. [101] designed why one event occurred in contrast to the other. Therefore, the
two metrics to characterize the complexity of decision contrastivity property specifies an explanation should answer
boundaries – one for the original space (input data) and questions related to why an event occurred in terms of its
another for the transformed space. The authors then verified possible causes (hypothetical alternatives). As an example,
the hypothesis presented by Li et al. [151] concerning the a ‘‘reasonable’’ explanation to a question such as ‘‘why did
decision boundary resulting from the last layer of a DNN event A happen instead of event B?’’ would provide the causal
trained with backpropagation converging to the solution of reasons that directed the model to event A [122]. In XAI,
a linear SVM trained on the transformed data. In a similar contrastive methods offer insights into why a model made a
context, Guan and Loew [152] developed a metric to assess specific prediction, highlighting the features that led to that
the complexity of decision boundaries, arguing models with prediction and contrasting them with features that would lead
simpler borders have optimal generalization ability. to alternative outcomes.
Englhardt et al. [153] proposed a technique to discover the Similarly, different scenarios can be described for
minimum sampling with (almost) uniform density containing a particular prediction in case of slight modifications
border points using original data. The technique is based in the input data, thereby explaining the possible
‘‘contrary-to-fact’’ consequences of those modifications. of original (target) models, which might lead to sensitive
Counterfactual explanations have a long history in philosophy information leaks.
and psychology because, for human explanations follow coun- The interest in contfactual explanations research has
terfactual dependence patterns [157]. In XAI, counterfactual grown because of their appealing alignment with human
methods generate instances or scenarios close to the input for reasoning, which could enhance machine learning trans-
which the output of the classifier changes [90], describing parency by transforming XAI applications into ‘‘human-like’’
characteristics that will change in the prediction in case of any explainers [22], [163]. However, the contfactual methodology
perturbation, deletion, or inclusion of values in the predictive has limitations. Bhatt et al. [90] indicated many current
features [124]. Counterfactual explanations do not explicitly contfactual methods make crude approximations, since finding
answer ‘‘why’’ a model predicts a decision; instead, the broad plausible counterfactual explanations (feasible in both input
goal is to describe a link between what could have happened if data and real world) is non-trivial and computationally
a certain input had been changed in a particular way, providing expensive. Contfactual explanations are highly domain-
directions toward the desired prediction [128], [157]. specific, which leads to a lack of standardization in evaluation
Poyiadzi et al. [158] developed a method that generates procedures [122]. Furthermore, current implementations of
actionable counterfactual explanations by constructing a contfactual explanations are model-specific or work for
weighted graph. It then applies Dijkstra’s shortest path models that are not black boxes in nature.
algorithm to find the instances that generate explanations Verma et al. [157] reviewed and categorized research on
according to density-weighted metrics and users’ require- counterfactual XAI, describing the desirable properties and
ments, providing suggestions on how much a change evaluating the advantages, disadvantages, and open questions
in the input would lead to the users’ desired outcomes. of the current methods. Interested readers can also consult
Raimundo et al. [159] introduced MAPOCAM (Model- Stepin et al. [122], who conducted an extensive literature
Agnostic Pareto-Optimal Counterfactual Antecedent Mining), review from the theoretical characteristics and differences
a multi-objective optimization algorithm that determines to the current state-of-the-art of contrastive and counterfactual
counterfactuals. Its input attributes are handled as a set of XAI in addressing explanations of causal and non-causal
cost functions applied in a tree-based search mechanism dependencies. Van Looveren and Klaise [164] addressed some
for identifying the changes that give rise to counterfactual limitations of the counterfactual approach.
antecedents. Multi-objective optimization has still not been
well explored in the counterfactual XAI literature; however,
MAPOCAM is computationally expensive and does not E. XAI BASED ON EXPLANATION OF GRAPH MACHINE
support categorical data. LEARNING
Contrastive and counterfactual methodologies are generally Graph Neural Networks (GNNs) represent a powerful class
similar, but not equivalent and cannot be used interchangeably. of models in Graph Machine Learning (GML) applied to
Contrastive explanations are more restrictive than counterfac- generating predictions on data associated with a graph as its
tuals [155], with counterfactual explanations usually being underlying structure. Several real-world applications naturally
seen as contrastive by nature [160]. Both strategies provide arise as graph models (e.g., social networks, fraud detection,
influencing factors through alternative scenarios or conditions knowledge graphs, bioinformatics, molecules modeling in
for helping users better understand the opearation of machine chemistry, street maps, document citation, and infrastructure
learning models. That is the reason why XAI literature optimization) [165], [166].
often uses term ‘‘contfactual’’ to agglutinating explanation The set of different specific approaches involved in GML
solutions covering both concepts independently and their is large, since feature vectors can be associated with graph
fusion [122]. nodes (e.g., document content), graph edges (e.g., messages
We observe a synergy between the contfactual methodology between users in a social network), and/or the whole graph
and the approach based on the characterization of decision (e.g., toxicity of a molecule). The Machine Learning task
boundaries, previously discussed in Section VIII-C. Both may also be a node-level prediction (e.g., predicting the class
investigate minimal modifications in the input data that to which the documents belong), edge-level prediction (e.g.,
modify a prediction context, offering an opportunity for forecasting traffic flow in the streets of a city), graph-level
researchers to design contractual methods that benefit from prediction (e.g., forecasting the solubility of a molecule), link
advances in the decision boundaries literature and vice prediction (e.g., recommending users who might follow each
versa. Aïvodji et al. [161] and Dissanayake and Dutta [162] other), and graph-to-graph interaction (e.g., predicting the side
leveraged the fact that counterfactual explanations lie close effects of taking two drugs simultaneously).
to decision boundaries and investigated model extraction by Accounting for not only the features associated with
strategically training surrogate models using counterfactuals. graph elements, but also the complex interactions defined
Such works highlight concerns about privacy in XAI since by the graph structure can make the explainability of GNNs
the information provided by counterfactual explanations can challenging, leading to less extensive literature in comparison
enable adversary attacks aiming to build faithful copies to a non-graph XAI scenario.
However, some recent and notable progress has been making predictions. Each information token is attended to
made in XAI research to providing explanations for GML every other simultaneously in parallel, enabling more efficient
complex models. Ying et al. [167] proposed GNNExplainer, and scalable computations [65]. In addition, the attention
a perturbation-based method devoted to explaining individual mechanism promotes selective focus on different parts of the
predictions made by trained GNNs that highlights the input sequence, contributing to context preservation, which
more influential nodes and edges in the input graph by is particularly effective for applications involving sequential
computing the gradients of the prediction concerning node data in, for instance, the Natural Language Processing (NLP)
embeddings. Despite providing valuable insights, the method domain [178].
has some limitations, such as sensitivity to initial node Vaswani et al. [65] introduced Transformers, a type
embeddings, need for approximations for large graphs, of Neural Network architecture that relies on attention
and difficulty in explaining complex interactions. PGM- mechanisms (or scaled dot-product attention) to capture
Explainer [168], another explainability technique for GNNs, relationships between different words or tokens in a sequence.
builds a probabilistic model for the desired node to be At a high level, the self-attention mechanism enables a token
explained. It generates a local dataset by randomly perturbing in a sequence to focus on other tokens in the same sequence,
the node features of a subgraph that contains the target node thereby assigning different levels of importance to each
multiple times. An interpretable Bayesian network then fits token. The transformer approach is not restricted to fixed-
the dataset and explains the GNN predictions by focusing on size contexts, enabling tokens to directly influence each other,
node explanations. regardless of their distance in the sequence [65].
In contrast to previous methods devoted to the explainability Transformers have demonstrated state-of-the-art perfor-
of graph elements or features, SubgraphX [169] aims to mance across many NLP tasks, but are widely adopted
identify important subgraphs; It employs the Monte Carlo Tree in various domains. Although Vaswani et al. [65] claimed
Search (MCTS) algorithm to explore subgraphs by pruning attention mechanisms employed on Transformers could
the nodes. The importance of each subgraph is measured yield more interpretable models, such transparency is
by an efficient approximation of Shapley values [170] that questionable [95], i.e., the interpretation of internal workings
considers interactions within the message-passing mechanism. of a transformer model can be challenging and must be better
Although SubgraphX yields human-interpretable subgraphs, understood [179], [180].
its computational cost is higher because of the need to explore Vig [179] briefly discussed studies that developed tools
different subgraphs through MCTS. to visualize attention in NLP models, from heatmaps to
XGNN [171] is a model-level graph explainability tech- graph representations. The author presented an improved
nique that employs graph generation, i.e, instead of making version of BertViz [181], a visualization tool composed of
computations directly in the input graph, it trains another three views following the small multiples design pattern
model to produce graphs that optimize the GNN prediction. for exploring transformer models at attention-head, model,
Such graphs are expected to contain discriminatory patterns and neuron levels and also demonstrated an interesting case
and thus provide the desired explanations. The framework can in which an attention-based model encoded gender biases.
incorporate constraints to ensure interpretable explanations, However, BertViz has certain limitations. It can show a slow
such as restraining the node degrees or the number of nodes in performance when handling extensive inputs or large models
the generated graph. However, one of its limitations is XGNN and only a few transformer-based models were included in
is suitable only for GML graph-level classification problems. it. The presentation of heat maps of attention weights can be
For further details on XAI for the Graph Machine Learning misleading, thus causing unclear interpretations. Furthermore,
context, we refer to [166], [169], [172], and [173] and to [165], counterfactual experiments can generate alternative heat maps
[167], [174], and [175] for metrics and benchmark datasets to that yield equivalent predictions [95], although Wiegreffe and
assess GML explanations. Pinter [100] claimed the existence of another explanation does
not mean the one provided is meaningless or false.
Pythia [178], a benchmark framework for evaluating Large
F. XAI BASED ON EXPLANATION OF ATTENTION MODELS Language Models (LLMs), includes several open-access
In traditional sequence models, such as Recurrent Neural and pre-trained transformer-based models, covering a wide
Networks (RNNs) or Long Short-Term Memory Networks range of scales up to 12 billion parameters. The study also
(LSTMs), information tokens are sequentially passed from highlights the critical role of model size in language modeling
one step to the next. Such a sequential nature limits the extent performance and provides analyses of gender biases and
to which the context can be captured, even with RNNs and memorization. Although memorization in LLMs has become
LSTMs bearing structures designed to hold information for a a significant concern, few tools enable data scientists to detect
longer duration [176], [177]. and prevent it. Biderman et al. [180] introduced an overview
In contrast, attention-based approaches do not process of memorization and proposed measures to understand and
the inputs sequentially. The attention mechanism enables the predict it.
model to assign relative importance to different parts of the Garde et al. [182] introduced DeepDecipher, an interactive
input sequence and ‘‘pay attention’’ to certain parts when interface for the visualization and interpretation of neurons in
the MLP layers of transformer models. It provides information differences-based optimization procedure not dependent on
on the behavior of neurons toward the understanding of when the model’s architecture. T-Explainer works with tabular data,
and why an MLP neuron is activated based on a pre-defined although it has limitations with categorical features.
database of sequences and a method that creates a graph of Bach et al. [192] proposed LRP (Layer-wise Relevance
tokens [183]. However, a neuron view may not represent its Propagation), which explains the predictions of complex
general behavior, and DeepDecipher does not introduce a non-linear models by decomposing the outputs in terms of
novel explanation method. input variables. The method is mathematically based on
Since large and complex attention-based models have DTD (Deep Taylor Decomposition) [191] for identifying
become increasingly influential in intelligent applications, pivotal properties related to the maximum uncertainty state
interpretability must be urgently provided for them. Hundreds of the predictions. It redistributes the predictive function in
of new studies have been recently published and, despite their the opposite direction through the projection of signals from
general success, transformer models must be better under- the output to the input layer by a backpropagation mechanism
stood [180]. Many XAI solutions to attention models apply uniformly applied to all model’s parameters [4], [106], [194].
visualization tools. Visualizing attention weights illuminate LRP is deemed a model-agnostic method because it avoids
one part of the predictive process, but not necessarily provide a priori restrictions on specific algorithms or mappings.
a reasonable explanation [95]. Chefer et al. [184] proposed However, it was designed as a general concept for black-box
a gradient-based method to compute relevancy scores for architectures based on non-linear kernels, such as Multilayered
transformer models. We address the gradient approach in the Neural Networks and Bag of Words, which include several
following section. well-known models strongly tied to image classification
Research on how a given transformer model learns and tasks [192].
represents data can potentially impact the next generation Let f be the learning model and x be the input instance, e.g.,
of software. One of the reasons for the gap in explainability the pixels of an image. LRP algorithm assumes the black-box
research is the lack of available large models that are also model can be decomposed into l layers of computation
(l)
openly accessible for tests and development [178]. For to attribute a vector of relevance scores Rd over each
further valuable discussion on attention transparency and intermediate layer. The attribution process is iterative and
explainability, we refer to [94], [95], and [100]. starts on the real-valued output in the last layer, from
where the calculated scores are backward propagated until
G. XAI BASED ON GRADIENTS AND SIGNAL they approximate the first (input) layer of the model as
DECOMPOSITION follows:
Gradient-based methods utilize the partial derivatives of X (1) X (l) X (l+1)
learning models to explain their predictions. They attribute Rd = · · · = Rd = Rd = · · · = f (x)
importance to the input features by analyzing the amount d d∈l d∈l+1
to which small perturbations in the input features impact (8)
the model’s output. Furthermore, computing the gradients with d representing the indices of the neurons in each layer.
of the output concerning the input is analog to verifying the LRP has been successfully used to generate measurable
coefficients of a Neural Network model [185], generalizing the values describing the processing of variables in Neural
deconvolutional network reconstruction procedure [186]. The Networks because its redistribution strategy follows relevance
early gradient-based proposals focused on determining inputs conservation and proportional decomposition principles,
maximizing neuron activity of unsupervised network archi- which preserve a strong connection with the model out-
tectures [187] and generating visualizations for convolutional put [195].
layers of deep networks [188]. Lapuschkin et al. [106] applied spectral clustering to LRP
Simonyan et al. [186] introduced the use of gradients score vectors to identify atypical patterns and behaviors
to generate saliency maps for supervised models – such in patterns learned from a pre-trained Neural Network.
an approach is referred to as Vanilla Gradient by the XAI The study demonstrated unnoticed biases in the training
community [189], [190]. It directly computes the model’s dataset, where many images from a specific class had a
output gradients through a first-order Taylor expansion [191] URL source tag. As a result, new images not associated
around a perturbed instance and a bias term. The product of with that class, but artificially manipulated to presenting
the gradient and input feature values (with no modifications) the source tag, were incorrectly classified. The resulting
is interpreted as a feature importance attribution. Despite its LRP relevance scores were rendered through heatmaps of
simplicity, the approach lacks fine-grained sensitivity and is same dimensionality of the input data (relevance maps) as
prone to noise within the gradients and neither the perturbation an interpretable visualization tool. The study of Lapuschkin
procedure, nor bias term were adequately specified [192]. et al. [106] illustrates a clear example of how explainability
Similarly, T-Explainer [193] relies on Taylor expansions tools can assist data scientists in discovering hidden biases in
to approximate the local behavior of black-box models learning models. Montavon et al. [191] and Kohlbrenner et al.
and perform feature attributions. However, the method [195] conducted reviews evaluating LRP approaches applied
computes gradients through input perturbations in a finite to Neural Networks.
VOLUME 12, 2024 80821
E. S. Ortigossa et al.: XAI—From Theory to Methods and Applications
The convolutional layers of Convolutional Neural Network methods, enhancing them by averaging multiple slightly
(CNN) architectures apply specialized filters across input perturbed input samples generated by the addition of small
images to learn complex visual patterns, such as spatial levels of Gaussian noise [202]. The process makes the final
information and high-level semantics. DeConvNet [188] explanations less sensitive to noise in the input data and,
visualizes the activity of intermediate layers of a CNN by formally, averages Mk sensitivity maps resulting from those
using the same layer components in reverse order. Grad- perturbed samples:
CAM (Gradient-weighted Class Activation Mapping) [196] k
generates explanations for any CNN-based model by attribut- 1X
MSG (x) = Mk (x + N (0, σ 2 )) (11)
ing importance scores to each neuron of the final CNN k
1
layer. The attribution process uses class-specific gradient
information [197] from the backward pass of backpropagation where x represents the input instance, k is the number of
for producing a localization map, highlighting the most perturbed samples, and N (0, σ 2 )) is the Gaussian noise with
influential regions in the input image for the model’s decision. σ as the standard deviation.
Specifically, Grad-CAM computes an importance score The Integrated Gradients method [185] differs from other
matrix wkc generating a localization map LcGrad-CAM ∈ Rm×n , gradient-based attribution methodologies by determining a set
where m and n represent, respectively, width and height of of interpolated samples between the input under explanation
an input image belonging to any target class yc , based on the and a baseline (usually an instance with ‘‘neutral’’ values, e.g.,
gradients from neuron weights on each feature map Mk of mean values or zeros). It computes the gradients of interpolated
the last convolutional layer, which cis computed before the samples and integrates them along the path from the baseline
application of SoftMax function, ∂∂y , by passing back over to the target input. Let ∂f∂x(x)
i
be the gradient of a model f along
Mk the i-th dimension of input data x and x′ representing the
m and n dimensions as follows:
baseline. Integrated Gradients is then defined as
1 X X ∂yc
wkc = (9)
∂Mkij ∂f (x′ + α × (x − x′ ))
Z 1
Z ′
i j IG i (x) = (xi − xi ) × dα (12)
α=0 ∂xi
where Z represents the number of pixels in the feature map,
which is used for outputting normalization. Integrated Gradients is suitable for generating global
Importance scores wkc represent a partial linearization of or local feature attributions for (theoretically) any Neural
a CNN and describe the importance of feature map k for Network model. However, it can generate inconsistent
class c. Finally, the importance scores are linearly combined explanations, since its performance is closely tied to the
by globally averaging them with their corresponding feature baseline choice, which depends on the domain context.
maps, passing them in a ReLU layer and plotting the final DeepLIFT (Deep Learning Important FeaTures) [200],
scores map in a heatmap: [203] is based on the concept of importance scores derived
! from LRP. The method aims to explain Deep Neural
X Networks (DNNs) propagating attributions at each layer of
c
LGrad-CAM = ReLU wc M .
k k
(10)
the deep model to compare the difference between a neuron
k
activation and a ‘‘reference activation’’ used as a baseline.
Grad-CAM and its recent variants [198], [199] generate DeepLIFT applies non-linear transformations based on a chain
interpretable visualizations by overlaying the scores heatmap rule to network multipliers. More specifically, it computes
on the original input image, providing visual information multipliers for any neuron to its immediate successors (a
that enables identifying the regions of the image that are target neuron) using backpropagation, which is similar to
more influential in the decision process. Grad-CAM does the application of the chain rule for partial derivatives.
not require architectural modifications or retraining; although According to Lundberg and Lee [109], this composition can
it is agnostic regarding different CNN models, it is restricted be equivalent to linearizing the Neural Network’s non-linear
to working on CNNs. Furthermore, it depends on activating components.
a ReLU layer for proper gradient sensitivity, may have Shrikumar et al. [203] also defined the rescale rule as
limitations for accurately determining the coverage of class an improvement of the LRP’s chain rule upon computing
regions, and is prone to instability when locating multiple the gradients regarding the output of backpropagation.
instances of an object within an image [32]. Chain-rule methods generally do not hold for discrete
Input × Gradient [200] attributes feature importance by gradients [204], making DeepLIFT and LRP violate the
computing the element-wise product of the model’s output implementation invariance property [205]. Saliency maps
gradients and the corresponding inputs, a process known as and input gradient-based methods suffer from the so-called
sensitivity mapping. However, sensitivity maps are prone neuron saturation problem [203]. Reference-based methods
to instability due to input noises leading to fluctuations in such as Integrated Gradients and DeepLIFT address that
the partial differentiation. Smilkov et al. [201] designed the limitation by comparing input features with reference values,
SmoothGrad approach to run on top of existing gradient avoiding the saturation issue [206]. However, the reference
activation (baseline) choice is made heuristically, leaving Figure 8a shows the classification probabilities of the
significant open problems, such as empirical computation of a instance under investigation, predicted by the black-box
good baseline and propagation of importance beyond simply model as belonging to Virginica class. Figure 8b displays
applying gradients [203]. the most relevant attributes in order of importance for the
prediction, with the float point values on the horizontal bars
H. XAI BASED ON SIMPLIFICATIONS informing the LIME’s importance values attributed to those
Explainability through simplification comprises techniques in features. Figure 8c provides an overview of the instance
which a new explainer model is built based on a trained model under investigation with the original values of each feature.
to be explained [2]. The goal of the simplified model is to The colors are distributed according to the contributions, i.e.,
perform a behavior similar to that of the original model with attributes in orange contributed to Virginica class and those in
less complexity, i.e., it must retain a predictive performance blue contributed to versicolor class. Color coding is consistent
similar to that of the original model, but based on more across charts. However, the way each attribute contributes
transparent structures. Thiagarajan et al. [207] developed positively or negatively to the LIME results is unclear. The
TreeView, a tool that visually interprets complex models. authors also developed two LIME extensions [208], [209],
It identifies discriminatory factors across data classes using including an improved version with provides clearer textual
sequential elimination through a hierarchical partitioning of explanations based on rules [210].
the feature space, clustering the instances into groups for each Among the main advantages of LIME is its flexibility.
factor, where undesirable associations are discarded. Any interpretable model can be used as a surrogate and even
LIME (Local Interpretable Model-Agnostic Explana- changing the original learning model, explanations can be
tions) [97] is among the well-known and widely applied generated for the dataset by the local interpretable model
explainability techniques. It determines an interpretable linear generated by LIME. The method is one of the few that works
model that locally approximates an original model. LIME on tabular, textual, or image data [40], although it is not
generates a neighborhood of synthetic samples around the suitable for applying to time series, since LIME independently
instance under explanation through perturbations on the constructs simplified models for each instance [211]. As an
instances of the original dataset. The synthetic samples example, LIME builds neighborhoods in image classification
are then classified by the original learning model, which by segmenting the input image into superpixels and perturbing
weights them by applying a weighting kernel according to the segmented image by randomly switching the superpixels
their proximity to the point under explanation. LIME then with a background color. The Boolean states of the superpixels
determines a linear model on the neighborhood, minimizing are then used as attributes of the simplified model [194].
a non-fidelity function, and the predictions are explained That type of strategy can be extended for enabling other XAI
through the linear model interpretation. methods to operate with images.
More specifically, let f be the trained model, g ∈ G Although LIME is considered an outstanding solution
be an interpretable model, and G be a class of potentially for XAI, it has some significant limitations. It is stable in
interpretable models, such as linear regression or decision explaining linear classifiers, but unstable in other cases, i.e.,
trees. Toward explaining an n-dimensional instance x = it sometimes provides explanations that do not align with
(x1 , . . . , xn ), an interpretable model g is determined to human intuition and can change its explanations entirely only
minimizing loss function L according to by running the code a few times, owing to the sampling
variance [25], [90]. Aas et al. [117] argued LIME does
E(x) = arg min L(f , g, πx ) + (g) (13) not guarantee perfectly distributed effects among variables.
g∈G
Furthermore, different models can fit the sampled data, with
where πx defines the weighting kernel centered on x LIME randomly selecting one of them without guaranteeing
responsible for maintaining the explanation’s local fidelity, it is, in fact, the best local approximation. No solid theoretical
and is a complexity term (which should be kept low) applied guarantee indicates a simplified local surrogate model
to g. adequately represents more complex models [4], i.e., original
Some authors have also classified LIME as a Local and surrogate models always produce similar predictive
Surrogate Model, defined as the class of methods that explain behaviors.
individual predictions through a locally trained substitute Defining a meaningful neighborhood around an instance of
model [40], called local fidelity. LIME has a simple and interest is a complex task. LIME bridges such a difficulty
informative graphical interface. Figure 8 displays an example by constructing a neighborhood around the point under
of a LIME-generated explanation for an instance of the well- explanation using the center of mass of the training data. The
known Iris1 dataset that, for simplicity, was adapted for strategy can contribute to the instability of the technique by
containing only two classes. Regarding a binary classification generating samples considerably different from the instance of
task, LIME provides explanations using a pattern of two colors interest, despite it increasing the probability of LIME learning
(orange and blue, in this case). at least one explanation [40]. LIME also relies on simplistic
assumptions regarding the decision boundaries of learning
1 https://archive.ics.uci.edu/ml/datasets/iris, visited on June 2023
models, assuming they are locally linear. However, decision
FIGURE 8. LIME explanation for an instance from Iris dataset, classified as belonging to Virginica class. LIME provides visualization tools with
different information on model classification, locally attributed importance values, and the input instance itself.
boundaries of models such as Neural Networks can be highly game. According to Kumar et al. [218], a collaborative game
non-linear, even locally, and a linear approximation in this comprises a set of n players and a characteristic function v,
context might lead to unstable explanations [212]. which maps subsets S ⊆ {1, . . . , n} into real values v(S),
LIME output values lack comparative meaning; it is not satisfying v(∅) = 0. The characteristic function describes the
straightforward to understand what the values attributed to extent to which the final gain can be attributed to individual
each input feature mean (Figure 8b) and the relationship players cooperating as a team in the game. Therefore, Shapley
between those values and the model prediction. Moreover, values represent a method of distributing the total value of
LIME linear weighting increases the influence of unperturbed cooperation, v({1, . . . , n}), among n individuals.
samples [194]. Since no reasonable way estimates the Let us consider v(i) the characteristic function applied to
weighting kernel or even an appropriate choice of its width attribute i (a player) from a subset of S attributes, i.e., i ∈ S.
ratio [40], LIME then chooses critical parameters, such as The Shapley value can be computed as a weighted average
weighting kernel, neighborhood size, and complexity term between the attribute’s i marginal contributions regarding
heuristically, leading to inconsistent behaviors, which might every possible subset of attributes S ⊆ {1, . . . , n} and the
affect the local fidelity [25], [109]. number of permutations of S:
Deterministic versions of LIME [213], optimization [214], X |S|!(n − |S| − 1)!
[215], and learning-based [216] strategies have been proposed φv (i) = (v(S ∪ {i}) − v(S)) (14)
n!
to reduce instability; however, those alternatives have the cost S⊆n\{i}
of increasing the number of parameters to be tuned. where φv (i) is the Shapley value of the i-th attribute, v(S) is
the expected value of the characteristic function conditional
I. XAI BASED ON SHAPLEY VALUES on subsets S ⊆ {1, . . . , n}, i.e., E[v(S)], n represents the total
Derived from classical game theory modeling, Shapley number of attributes, and | . | denotes cardinality [119]. Note
values [170] describe a way to distribute the total gains/costs v(S ∪{i})−v(S) describes the marginal contribution of a player
of a cooperative game among players, satisfying fairness to a combination of players S, i.e., variation 1v (i, S) generated
criteria [119], i.e., determining Shapley values is a cost-sharing when i is included in S [218].
problem [205]. According to Moulin [217], cost-sharing Figure 9 illustrates the concept behind the calculation of
problems are central subjects in several areas that require Shapley values in a set holding three attributes {a, b, c}. Each
splitting joint costs and proportionally allocating their shares possible combination of attributes must be considered so that
among each individual contributor. the attributes’ individual contributions can be verified, i.e., all
As an example, electricity is a public utility with a long possible subsets S, with |S| ranging from 0 to n (n = 3 in this
production chain that, in a simplified way, starts at the case), must be computed and verified.
power-generating units and moves through transmitters and Each vertex in Figure 9 depicts a combination of attributes
distributors until it reaches the final consumer. Determining and each arrow corresponds to an attribute inclusion not
how much the consumer will pay and fairly distributing this present in the previous combination. Note the original
value to each link in the production chain is a typical cost- approach for the calculation of Shapley values requires
sharing problem. verifying 2n combinations, which is the number of possible
A Shapley value represents the average marginal contribu- subsets of {1, . . . , n}, making the Shapley value computation
tion of a player evaluated over all possible combinations of an NP-hard problem (since n grows, Equation 14 tends to be
players, i.e., it is a weighted average of individual contributions unfeasible).
related to all possible compositions of individuals [40]. However, calculating the Shapley value in datasets
An aspect of Shapley values lies in their solid theoretical containing few attributes is relatively simple. Let us consider
foundation, which axiomatically ensures a fair distribution the example shown in Figure 9. The Shapley value of
of gains/costs among the participants of a collaborative attribute a is computed assessing the marginal costs among
applications. However, SHAP assumes feature independence efficient, for its explanation values do not add up to the original
and uses a marginal distribution to replace the conditional prediction.
distribution [117], enabling the conditional expectation By transforming explanations into an additive linear
approximation to estimate the Shapley values directly through modeling, SHAP promotes a connection between Shapley
an Additive Feature Attribution modeling. values and LIME [40]. Lundberg and Lee [109] developed
Additive XAI methods assign an effect to each attribute KernelSHAP, a SHAP version based on the concept of LIME.
(see Equation 7) and the sum of all feature attribution effects Although the LIME formulation differs from the Shapley
should lead to a value that makes sense for the original value one, both LIME and SHAP are additive attribution
model prediction [109]. Note the correspondence between methods. However, LIME heuristically chooses weighting
cost-sharing and attribution problems – Shapley values kernel, loss function, and complexity term, violating the
distribute collaborative game costs among its players, the consistency axiom and affecting local accuracy, resulting in
learning model can be taken as equivalent to the characteristic unstable explanations [40], [109]. Equation 14 is a difference
function, with the game (total cost) as the prediction value, the of means. Because the mean is the best least-squares point
players as input features, and cost shares being the importance estimated for a dataset, a weighting kernel can be found by the
attributions [205]. least-squares method. Lundberg and Lee [109] demonstrated
Formally, let f be the learning model under explanation, how to determine weighting kernel, local loss function, and
g be the explainer model, and x = (x1 , . . . , xn ) ∈ X be complexity term in the context of Shapley values.
the n-dimensional instance to be explained. Using x′ as a Given the linear formulation of LIME, KernelSHAP can
simplification of x, as defined by [208], i.e., x′ ≈ x, SHAP estimate Shapley values through regression-based solutions,
defines a mapping function hx for original instance x = hx (x′ ), which is computationally more efficient than directly
such that g(x′ ) ≈ f (hx (x′ )). Even if the simplified instance computing the classical equation of Shapley values. Note
retains less information than the original instance, hx ensures SHAP and LIME (the original version) explain predictions
no significant loss of information occurs. SHAP then generates differently. LIME indicates the feature that is most important
an explanation model g that locally approximates original for a prediction, whereas SHAP indicates the contribution of
model f determining the Shapley values for each attribute of each feature to the prediction. Although both methods compare
x additively according to predictions under an explanation with an average probability,
X SHAP verifies the difference between predicted and expected
f (x) = g(x′ ) = φ0 + φi xi′ (16) values of the global average prediction. Simultaneously, LIME
i∈n
explains the difference between prediction and local average
where φ0 represents the prediction expected value, E[f (X)], prediction (from neighborhood sampling) [117].
and φi is the Shapley value related to attribute xi , calculated SHAP is currently one of the leading state-of-the-art
by Equation 14, where the marginal gain is estimated by methods in feature attribution/importance XAI due to
f (hx (x′ )) = f (x) as a characteristic function. Determining its tolerance to missing values and Shapley’s theoretical
E[f (X)] for an arbitrary dataset is not a trivial task; in practice, guarantees regarding local precision and consistency [25].
SHAP estimates the prediction expected value through the SHAP Python library2 offers a range of methods and a valuable
average model output across training dataset X when feature set of graphical tools for visually analyzing explainability (see
values Xi are not known. Figure 10). Iris dataset was used for the generation of the charts
Equation 16 indicates SHAP approximates learning model f in Figure 10, which, for simplicity, was adapted to containing
through a linear additive model g, enabling locally estimating only two classes, namely, Versicolor and Virginica. The setting
the predicted value of f based on the Shapley values of is identical to that of the LIME example in Figure 8.
an instance’s attributes as parameters of a linear model Figure 10a displays explanations for all dataset instances,
g. Since the application of Equation 14 can be costly summarizing the importance of the attributes, indicated on
because of the large number of possible combinations, the left side of the chart and ordered vertically according
SHAP employs an approximation strategy based on Monte to their global mean importance. Each point represents
Carlo integration of a permutation version of Shapley’s the Shapley value attributed to each variable from a data
classical equation, with samplings taken separately for each instance. The points were horizontally dispersed according
attribution [109]. to the Shapley value; therefore, the more distant from zero
According to Lundberg and Lee [109], SHAP locally in the positive direction, the more influential the attribute
estimates the contribution of each feature, respecting the local on the predicted class of the instance. Values farther from
accuracy, missingness, and consistency axioms (SHAP can zero toward the negative side indicate the attribute has more
also estimate feature importance at the global level [119]). importance to classes other than to the predicted one. Finally,
In contrast to LIME, SHAP allows contrastive explanations, points are accumulated vertically for indicating Shapley value
i.e., comparing a prediction in the context of a specific density distribution per attribute and colored according to the
subset of instances or even a single instance instead of attribute’s values, from smallest to largest.
only comparing predictions with the entire dataset’s average
prediction [40], [218]. LIME is not necessarily locally 2 https://shap.readthedocs.io/, visited on February 2024
FIGURE 10. SHAP library provides visualization tools to transform attributions into graphical information, describing important features and their
relationships, whether locally or globally. In this example, SHAP visualizations were used to explore the most important features of Iris dataset (global
view) and important ones of a sample classified in Virginica class (local view).
Figure 10b shows the average partial relationship (marginal prediction in the training data), and the value of each variable.
interaction effects) between one or more input variables, Under the horizontal line, the colors of the arrows indicate
where each point represents the prediction of a data instance. influence of the variables – red color denotes attributes that
The chart considers all instances and describes the global contributed positively and blue indicates those that contributed
relationships between the variables with respect to the model negatively. The larger the arrow, the more significant the
prediction. Instances were selected from Virginica class in the variable impact. Although the force plot is a succinct metaphor,
example. The abscissa axis depicts the sepal length variable its horizontal arrangement is inefficient for handling many
and the ordinate axis represents the Shapley value attributed variables simultaneously. Such a limitation is avoided in
to the respective sepal length variable, indicating the extent to the chart in Figure 10d, which shows similar information,
which the sepal value modifies the prediction for each instance. but organized vertically in a ‘‘waterfall’’ format. The lines
Values farther from zero indicate the variable is important for in Figure 10e represent each variable effect, enabling the
Virginica class. The points are colored according to the petal visualization of the profile of feature importances from one or
width, which is the most significant attribute of the interaction more instances simultaneously.
effect related to sepal length. We used a Random Forest binary classifier model to
The chart in Figure 10c provides a local analysis of a generate the chart sequence shown in Figure 10. The expected
specific data sample so that the contribution of each variable output of a binary learning model is a probability value
to a single prediction can be understood. The force plot between 0 and 1, indicating which of the two classes the
shows the predicted value of the instance under explanation, samples is most likely to be classified. However, the final
the base value (expected output for the model’s average SHAP values are different, since, by default, SHAP explains
a classifier’s prediction in terms of its marginal result before as an improvement over DeepSHAP and DeepLIFT that
applying the output activation function. Therefore, SHAP units explains the complex distributed series of models. In contrast
are log-odds contributions rather than probabilities. According to defining the absence of a feature by masking features
to Aas et al. [117], such a design choice is not appropriate according to a single baseline, G-DeepSHAP selects a baseline
for promoting a direct interpretability of SHAP results due distribution using k-means clustering. The sample under
to the difficult interpretation of the output in the log-odds explanation is compared with that distribution of baselines,
space. Explaining a model’s probability output rather than which decreases the bias of relying on a single baseline and
the log-odds output can yield more naturally interpretable enables feature attributions to answer contrastive questions.
explanations [114]. G-DeepSHAP generalizes the rescale rule introduced in
The SHAP authors published other studies with optimiza- DeepLIFT for explaining series of models architectures by
tions and new tools. Lundberg et al. [222] proposed TreeSHAP, propagating attributions to a series of mixed-model types
a specific and faster version of SHAP for complex models rather than only layers in a deep model. The group rescale rule
based on gradient-boosted tree ensembles that can analyze of G-DeepSHAP also reduces the dimensionality of highly
interaction effects, since it considers dependency relationships correlated features; however, a limitation is G-DeepSHAP
among features, which is a well-known limitation of permu- does not guarantee it satisfies Shapley’s desirable axioms.
tation methods. TreeSHAP breaks the feature’s contributions Among the vast range of recent publications based on
into its main and interaction effects by taking advantage of the Shapley theory, Casalicchio et al. [84] proposed a local
hierarchy in the decision tree’s features for estimating some Shapley-based approach to measure the feature importance
degree of dependency (but not all dependencies) among the of individual observations, providing visualizations through
inputs using valid tree paths (weighted average of the final visual tools related to Partial Dependence and Individual
nodes reachable by the permutation subsets) [117]. That study Conditional Expectation plots. Hamilton et al. [194] extended
also presents other contributions, such as supervised clustering, SHAP (and LIME) for explaining deep CNN models applied
which addresses one of the most challenging problems within to similarity image searches and retrievals.
the unsupervised clustering context. Supervised clustering
uses the feature attribution concept to convert input variables
into values with the same units of the model output and J. PROBLEMS IN XAI RELYING ON SHAPLEY VALUES
then determines the weights (metric distance) toward direct Explaining predictions based on Shapley values is a popular
comparisons of the relative importance among variables with topic in XAI research. This subsection extends the discussion
different metric units. on that class of methods due to the multiple proposals derived
Lundberg et al. [14] developed an improved version of from the Shapley concepts in the literature, providing the
SHAP, extending the concept of local explanations to separate reader with an in-depth review of Shapley-based XAI. The
and capture the individual interaction effects of variables previous section presented Shapley’s theory, evolution, and
not only on single instances, but also on pairs of instances, advantages of application for explaining Machine Learning.
providing explanations in a matrix of feature attributions. However, the approach and SHAP, its main derivation, also
Chen et al. [113] extended SHAP from the perspective have significant limitations.
of DeepLIFT, creating DeepSHAP, devoted to providing SHAP can be applied to non-structured data, such as text
explanations for Deep Learning-based models and Gradient- and images, relying on additional assumptions and heuristics
boosted Trees [64]. for generating a feature set. Slack et al. [223] proposed
Lundberg and Lee [109] revealed a connection between an adversarial mechanism to generate a biased classifier
DeepLIFT and Shapely values such that DeepLIFT can be that cannot be detected by SHAP, thus highlighting its
considered a fast approximation of Shapely values [203]. vulnerabilities and the need for validation in XAI. A Shapley
DeepSHAP explains deep models by performing DeepLIFT value allocates an importance value to a variable rather than
using a baseline reference – in this case, the baseline is a subset an interpretable prediction model such as LIME. Therefore,
of samples with (not necessarily) randomly adjusted values, most methods derived from Shapley values cannot verify
called ‘‘background distribution.’’ The instance under analysis the way a prediction changes through modifications in input
is explained from a set of variables to be ‘‘missed,’’ which data. KernelSHAP addresses that limitation enabling LIME
refers to corresponding values in the background distribution. to estimate Shapley values [40].
The SHAP value of the instance was obtained for each Amparore et al. [25] described the advantages of using
sample from the background distribution and the final value SHAP in different scenarios, suggesting it is the most suitable
is calculated by averaging the importance attributions from choice when the analytical objective is local concordance,
the background distribution. DeepSHAP is compatible with achieving exceptional levels of concordance. However, the
PyTorch and TensorFlow; however, it can be less accurate than study also reported SHAP is not much more stable than
SHAP due to more complex approximations. LIME and its alleged advantage can be exploited in practice
Chen et al. [114] presented Generalized DeepSHAP only for datasets with few variables. The exact computation
(G-DeepSHAP), a local feature attribution method developed of Shapley values is resource-intensive – Aas et al. [117]
indicated its intractability for datasets with more than only problems related to XAI that rely on Shapley values,
ten variables. In most cases, only approximated solutions are demonstrating the solutions for avoiding those mathematical
feasible [40]. Hooker et al. [224] reported SHAP exhibits problems introduce more complexity with no significant
a deterministic behavior on low-dimensional data; however, gain in explainability capacity. As an example, when a
when applied to high-dimensional data, it uses statistical set of variables is arbitrarily large, a meaningful set must
sampling techniques based on Monte Carlo integration, be selected for providing a concise explanation. However,
tending to unstable explanations. explanations can change considerably in function of the
The correlation among features can be a severe problem, selected variables. Furthermore, it is not clear whether two
since importance tends to be split among correlated features, statistically related attributes can be considered individually,
masking the true importance of each feature. Hooker and such as in SHAP’s additive attribution modeling, even when
Mentch [225] showed explainability from feature importance the feature independence assumption has no direct impact on
methods that rely on permutation-based strategies, such the final result.
as SHAP, can be highly misleading. They addressed the Kumar et al. [218] claimed it is unclear whether the solution
creation of subsets of features that hold impossible or unlikely of computing an average of the sums describing ‘‘all possible
combinations within the original data context. Specifically, explanations’’ is an acceptable way to provide explanations in
permuting the set of input features in SHAP is performed the context of Machine Learning. Kumar et al. [86] verified
naively and might result in subsets combining features that do information losses of Shapley values concerning interactions
not necessarily make sense in the real world. The strategy among correlated attributes and developed Shapley residuals,
works well for data with total independence among the a proposal that is not an explanation or evaluation method per
variables; however, statistically independent features are se. Instead, it quantifies the information lost by Shapley values.
rarely found in observational studies and machine learning Shapley residuals highlight the limitations of Shapley values,
problems [117] and assuming complete independence between indicating when importance attributions can rely on missed
features is similar to ignoring the ‘‘whole is more significant relationships. In such scenarios, Shapley-based explanations
than the sum of its parts’’ concept in predictive modelings. should be taken with some skepticism.
When submitted to combinations of correlated attributes What elements influence Shapley values? How can features
following unrealistic configurations, the learning model is distribution influence a Shapley value? How can a Shapley
forced to extrapolate to unknown regions of the learned space, value explanation change according to different predicted
even assuming independence in cases with at least a few of outputs of a same model? Kumar and Chandran [119]
the features with a high degree of correlation. Since XAI highlighted the difficulty in addressing those questions, which
permutation-based methods are sensitive to the way the model is related to Shapley values (Equation 14) lacking a closed
extrapolates, the extrapolation behavior becomes a significant solution within feasible computational times and numerical
source of error, leading to the generation of explanations based estimates being costly. The authors demonstrated in addition
on the capture of unwanted or distorted information [226]. to depending on the attribute’s values and the learning model,
Furthermore, ignoring dependency structures by assuming Shapley values depend on data distribution. As an example,
independent distributions, as in SHAP, is a property for which when the overall variance is low, most observations fall into a
the consequences must be carefully studied [205]. small region of the space, implying the probability curve can
A solution to the limitations related to assuming feature be approximated through a line in that region. On the other
independence is conditional sampling, in which variables hand, high variance implies a volatility situation in which few
are conditionally sampled according to those already in observations are concentrated around zero, which is the point
the explanation setup. However, it violates the symmetry over the probability curve with the highest derivative, with
axiom [227]. Wojtas and Chen [121] proposed an interesting Shapley values increasing in magnitude as the variable value
dual-net mechanism for learning and selecting optimal feature deviates from the mean.
subsets in feature-importance tasks, although the dual-net Shapley values are the most common type of importance-
architecture imposes a computational burden on the training. based explanation [90] and one of the most promi-
Aas et al. [117] claimed, in theory, TreeSHAP considers nent approaches in the recent XAI literature. However,
dependence among input features, but, in practice, it is Kumar et al. [218] highlighted a Game Theory framework
potentially inaccurate when the variables are dependent. The does not automatically solve the importance attribution
authors also extended KernelSHAP using TreeSHAP elements problem, although it is an adequate general solution for
to handle the dependent attributes; however, the solution quantifying the importance of variables.
suffers from computational complexity.
Kumar et al. [218] and Kaur et al. [228] argued Shapley K. EVALUATION METHODS AND METRICS
values are not a natural solution to human-centric explanations Despite the existence of a wide range of XAI methods, a proper
due to the lack of clarity in the analysis of the method, evaluation of results from explanation methods still faces
which may lead analysts to confirm biases or even some some difficulties. A problem in XAI evaluation is, in general,
overconfidence. Kumar et al. [218] introduced mathematical we do not have ‘‘ground truth,’’ i.e., the literature reports
no reliable references of what an adequate explanation for Haug et al. [234], the closer the approximation of a baseline
each black-box problem could be. As an example, machine- to the original data distribution, the more discriminative
learning engineers still consider domain experts’ judgments the approximation, i.e., baselines that deviate from out-of-
an implicit ground truth for explanations [90]. Therefore, distribution (OOD) can produce invalid explanations.
the solutions proposed often rely on axioms for determining Closely related to ablation, feature selection algorithms
the desirable properties of explanations or use simulated have been used for improving the assessment of the role
data for checking and comparing what can be computed of features in prediction and explanation tasks [129]. Some
in terms of explanations [117]. Even synthetic datasets feature selection methodologies select a subset of important
designed to hold ground truth explanations [229] suffer features from an original set, one-by-one or group-wise,
from a significant drawback due to the lack of guarantees according to their relative importance to each other and
learning-based models trained on those synthetic datasets through feature elimination (top-down) [235] or inclusion
will adhere to the underlying ground truth [175], [189]. (bottom-up) [129] approaches. Although feature selection
Furthermore, defining ground truth is generally unavailable reduces data dimensionality and computational complexity,
in most real-world applications that use explainability [230]. it also depends on retraining processes.
Relying on axioms to explain high-stakes Machine Learning Weerts et al. [236] applied qualitative analyses and Adadi
may be insufficient. In this context, evaluation methodologies and Berrada [4] extensively discussed the lack of evaluation
follow four main lines, namely, (i) measurement of the in XAI, claiming the existence of few studies on XAI
sensitivity of explanations to perturbations in the model and evaluation was due to the subjective aspect of explainability
input data, (ii) inference about the behavior of explanations and highlighting rare studies were dedicated to the challenges
from feature removal, (iii) evaluation of explanations from of generating explanations truly understandable by humans.
controlled setups in which the importance of attributes is Researchers have also argued contrastive explanations can be
previously known, and (iv) visual assessment based on insights applied in assessment contexts, for human-social interaction
of human analysts (human-in-the-loop process) [110]. is based on contrastivity [22]. However, Hooker et al. [226]
Qualitative and quantitative evaluations in XAI commonly indicated contrastive and counterfactual explanations tend to
correspond to the plausibility of explanations and fidelity be extrapolation problems (similarly to SHAP), which makes
to model behavior, respectively [114]. Bodria et al. [231] them potentially misleading. Furthermore, identifying optimal
discussed quantitative tests and Yang and Kim [110] organized counterfactuals is an NP-hard task [237].
good research references regarding each of the four evaluation According to Yang and Kim [110], one reason for the lack of
methodologies and proposed an evaluation framework based concrete works with human analysts is the human-in-the-loop
on perturbations. They also held a raw discussion on false evaluation process is complex and costly, since it involves
explanations and highlighted evaluating explanations is as sociological and psychological considerations. However,
essential as developing XAI methods. The study also proposed expert knowledge may enrich the context of explanations,
two metrics for evaluating the extent to which and how the making them more understandable [238].
explanations should change according to modifications in In contrast to qualitative evaluations, quantitative ones
the input attributes (for image data) in a controlled setting. are relatively independent of the model under explanation
Yang and Kim [110] did not guarantee an XAI technique that and almost exclusively devoted to the feature attribution
performs well in their framework would also perform well context [114]. Amparore et al. [25] indicated a need for a
on real data and argued their metrics are simple tests for XAI consensus on fundamental metrics and the lack of definitions
techniques and those that fail in simple tests are likely to fail for quantifying the effectiveness of explanations. They
in more complicated scenarios. demonstrated an unexpected behavior in LIME and SHAP
Ablation is a line of XAI sensitivity assessment that and proposed four metrics to verify the different aspects of
attempts to evaluate global or local explanations by removing XAI techniques. Liu et al. [229] also showed some flaws in
information from input features according to their importance commonly used explainability methods and defined a set of
ordering. In other words, an ablation study assesses the metrics and a methodology for generating synthetic data for
relative performance of a learning model perturbing its input application in evaluations. Hooker et al. [224] presented a
features in a rank order of importance measured by the method for evaluating important variables and discussed the
explanations [232], [233]. Theoretically, if an XAI method difficulty in directly ‘‘buying’’ results from XAI methods
is adequately applied, the important features decrease the such as LIME and SHAP. Alvarez-Melis and Jaakkola [239]
model performance when it is perturbed. Hooker et al. [224] evaluated the drastic effects minimal perturbations can exert
presented ROAR (RemOve And Retrain) framework based on on XAI methods, claiming most XAI approaches are not
ablation to benchmark explanations from image processing sufficiently robust even when applied to explain robust
models. It is costly, since it retrains the model for every learning models.
perturbed input and the retraining processes diverge from DeYoung et al. [240] presented a benchmark framework
the post-hoc paradigm. Furthermore, ablation assessment with metrics to measure the faithfulness and plausibility of
is dependent on baseline approximations. According to explainability. Faithfulness refers to the extent to which an
explanation accurately represents the reasoning process of the collection of real-world datasets, pre-trained models, fea-
model and plausibility evaluates the agreement of explanations ture attribution methods, and implementations of diverse
with human-provided rationales [241]. quantitative metrics for evaluations of faithfulness, stability,
Petsiuk et al. [242] introduced the Prediction Gap on and fairness of the included XAI methods. OpenXAI can
Important feature metric (PGI), which measures the predictive benchmark new explanation proposals, despite a certain
faithfulness based on the change in a model’s prediction difficulty in customizing some of its parameters.
probability from the selection of k features deemed as the most Quantus [244] is a comprehensive and well-documented
important ones and determined by a post-hoc XAI method. Python library that provides several metrics from various
PGI is defined as evaluation categories, enabling comparative analyses of XAI
m methods and attributions.
1X
PGI(x, f ) = [|f (x) − f (x̃)|] (17)
m L. XAI LIMITATIONS
i=1
where f represents a learning model, x is the original input What is a good explanation? Miller [22] defined a good
data, and x̃ is the same input, but holding the k most important explanation as one that is true in reality. Reality in machine
features. The higher the PGI value, the more faithful the learning is limited to the ‘‘truth’’ learned from training scope,
explanation. Barr et al. [233] compared metrics applied to which can hide unknown biases.
measure the faithfulness of local explainers, showing most of A way to promote trust is to increase the transparency
the current metrics do not agree, which is a gap that should be of intelligent applications. An essential part of increasing
investigated by the XAI community. transparency is the application of explainability [16], which
Stability metrics measure the sensitivity of explanations can potentially unravel the black boxes of complex learning
to changes under specific modifications of the model’s algorithms, enabling the introduction of more trust elements
hyperparameters or input data [123]. However, the XAI for supporting systems that actively use intelligent models.
literature reports no agreement on stability (also known However, considerable discussions on the limitations of XAI
as sensitivity or robustness). Mishra et al. [128] discussed and the aforementioned concerns on the lack of evaluation
different definitions of stability metrics for feature importance have been held. Although this study discussed several
and counterfactual explanation methods under the umbrella concepts, needs, challenges, and XAI methods, Kaur et al.
of robustness as a unified term. [228] showed not all data scientists know how to apply XAI
Alvarez-Melis and Jaakkola [123] introduced local Lips- correctly in machine-learning pipelines.
chitz continuity, one of the first metrics for evaluations of Several investigations (including ours) have raised concerns
the stability of local explanation methods. It generates a over the need for more precise definitions of XAI basic
neighborhood Nx of sampled instances x′ by adding small terminology [4], [16]. Significant efforts were made in this
perturbations around original input x. ex is expected to be review toward identifying the main concepts and specifying
similar to explanations ex′ because instances x′ are sampled to definitions as clearly as possible for each of the main elements
be similar to x. Agarwal et al. [243] improved local Lipschitz of XAI theory in light of the literature. Although such
continuity by introducing the relative stability concept definitions will contribute to clarifying many conceptual
that evaluates the stability of an explanation considering doubts in future research, the XAI area lacks agreement. As a
perturbations in the inputs and output predicted probabilities of result, every new paper addressing some XAI aspect or method
the underlying model. Therefore, Relative Input Stability (RIS) has to introduce the same related terms in its own way, which is
and Relative Output Stability (ROS) metrics are formulated unnecessary and, consequently, leads to a profusion of similar
according to (and confusing) terminology.
Krishna et al. [190] discussed the frequency at which
ex −ex′
∥ ex ∥p explanations produced by state-of-the-art methods disagree
RIS(x, x′ , ex , ex′ ) = max x−x′
and (18) with each other. The disagreement phenomenon may be
x′ max (∥ x ∥p , ϵc )
tied to a lack of common objectives among explainability
ex −ex′
∥
ex ∥p methods [245]. The authors also conducted a user study
ROS(x, x′ , ex , ex′ ) = max f (x)−f (x′ )
, (19) with data scientists on solutions to such disagreements in
x′ max (∥ f (x) ∥p , ϵc )
explanations. Han et al. [245] unified popular local feature
∀x′ ∈ Nx restricted to ŷx = ŷx′ (only perturbed instances importance methods into a same framework and demonstrated
predicted as the same predicted class of x), where p is the no one could generate optimal explanations across all data
lp norm for measuring input/output changes and ϵc > 0 is a neighborhoods. Their results showed disagreement can occur
clipping threshold for avoiding zero division. The larger the because different explainers approximate the model using
RIS/ROS values of the underlying explanation method, the different neighborhoods and loss functions. The authors also
more unstable the method to input/output perturbations. established guidelines for choosing XAI methods according
Agarwal et al. [189] introduced OpenXAI, an open- to faithfulness to the model.
source framework for evaluating and benchmarking post-hoc Permutation techniques are among the leading XAI
XAI methods. The tool includes a synthetic data generator, approaches and are easy to describe, develop, and use. They
show appealing results by producing ‘‘null features,’’ breaking traditionally characterized in different categories, many
the connection between features and target variables [17], ‘‘explain prediction’’ methods such as LIME, a feature
[84]. Although permutation methods may be effective importance/attribution method often classified as a surrogate
under a global null, they may fail to yield accurate or simplification method, provide explanations through feature
explanations in cases that differ from the global null [246]. importance tasks. Similarly, LRP is an attribution method
Hooker et al. [226] demonstrated how simple it is to generate based on gradients and signal propagation framework.
examples in which permutation-based explanations can be Data scientists use feature importance for a variety of
misleading or distorted. tasks (e.g., verifying the existence of biases in datasets
Explanation through causal relationships has rarely been or deficiencies in models, comprehending the underlying
explored in the XAI literature. Uncovering causality in learning process, and providing insights for further feature
Machine Learning is far from trivial, although regarded as engineering). Note feature importance methods can generate
a meaningful goal in explainability [90]. information on the importance of a particular feature to a
Non-numeric variables are another limitation that can be prediction made by a specific trained model. However, they
considered a challenge for XAI methods and properly handling do not disclose that feature’s generalizability or possible
categorical variables is also a challenge in learning algorithms. importance to any other learning models, although such a lack
The solution traditionally used in this context is one-hot of generalizability between explanations in different models
encoding, a simple method derived from digital circuits might be extended to other XAI approaches.
that transforms non-numeric features into binary matrices. Table 1 shows an overview of the literature reviewed in
However, in large datasets with many categorical attributes this research, according to XAI approach and explanation
holding high numbers of distinct categories for each attribute, objectives (note some methods can be classified into more
one-hot encoding significantly increases the degree of data than one category). We highlighted the main strengths and
sparsity which, in turn, increases data dimensionality, with limitations commonly related to each approach.
most of the encoded values added as extra columns of little Table 2 summarizes the objectives of the explanation tasks
individual importance. An alternative for one-hot encoding shown in Table 1. For a more detailed discussion on XAI
is target encoding [247], which converts each value of a categorizations, see Section VII-B.
categorical attribute into its corresponding expected value.
The resulting transformation does not add extra columns, IX. PRACTICAL EXAMPLES OF SUCCESSFUL XAI
avoiding turning the dataset into a sparser high-dimensional APPLICATIONS
dataset. Aas et al. [117] suggested alternative approaches This section provides a high-level discussion of examples
from clustering literature that describe distribution functions from various domains where XAI has been successfully
for manipulating non-numerical data [248] and generalizations applied. According to Google Scholar,3 we used as a search
of Mahalanobis distance for mixtures of nominal, ordinal, and engine, SHAP [109] and Grad-CAM were cited in more
continuous attributes [249]. than 20 thousand publications each, LIME [97] was cited in
Some studies have contested the real need for explana- more than 16 thousand ones, Vanilla Gradient [186] counted
tions [250], [251]. As discussed, explanations on learning nearly eight thousand citations, Integrated Gradients [185]
models are not always required. However, after our argumen- showed around six thousand citations, and LRP [192] and
tation, a clear understanding of the problems associated with DeepLIFT [203] roughly received four thousand citations
relying exclusively on black-box models based on good metric each. The numbers show the impact of those techniques, since
performance is expected, since auditability is required for our they are recent publications. Proceeding with a comprehensive
understanding of the reasons behind predictions of complex analysis of every XAI application would be an overwhelming
models. In this study, we consider the questions about the endeavor, despite the removal of possible duplicates and
‘‘human-centric’’ lack of significance in explainability and studies with no in-depth discussions on XAI; therefore,
the impacts on comprehension’s mental models of users very we selected impacting publications from high-level venues
pertinent – both as a consequence of the lack of explanations covering different areas toward an overview of how the
completeness and the information overload generated by XAI elements of XAI translate into practical applications.
methods [218]. XAI has been widely addressed in medicine and health-
However, black-box model unboxing should be integrated care research. Caruana et al. [20] used a high-performance
into machine-learning development pipelines. Although GAM model to predict the risk of 30-day medical readmissions
significant, the present limitations will mature with research related to pneumonia cases. They presented detailed case
evolution in the XAI domain and more careful development studies with impressive results, leading to the discovery of
of future tools. Such criticisms must be raised to addressing patterns that previously prevented complex models from
them and enabling the evolution of XAI. being deployed for medical applications. Lundberg et al. [12]
applied SHAP to ensemble learning-based models in a
M. SUMMARY AND CHARACTERIZATION OF XAI METHODS medical context and predicted complications during surgical
According to Bhatt et al. [90], feature importance is the
most widely used class of explainability technique. While 3 https://scholar.google.com/, visited on March 2024
procedures. Medical specialists tested the methodology by of research of such nature enhances transparency and trust in
using a graphical web interface, returning positive feedback. automated disease diagnostics. Zhang et al. [253] discussed
Meske and Bunde [252] employed models to detect malaria interesting approaches to modeling the understanding of
in cell images and then used LIME to explain which part of clinical texts using Transformers. Properly processing the
the cell caused the model to make its prediction. The impact semantic information contained in clinical notes provided
by doctors could automate several applications related to architecture where the COVID-19 class can hold patches with
medicine and healthcare. The authors proposed using the different scores. Then, Grad-CAM was patch-wise adapted by
visualization tool provided by Vig [179] for extracting existing weighting its saliency maps with the classes’ probabilities.
relationships learned by the Transformer (e.g., symptoms and Characterizing elements that influence cancer is a
body parts). biological and clinical challenge [261]. Chen et al. [262]
Lawhern et al. [254] developed a Convolutional Neural presented a framework based on different Neural Networks
Network (CNN) to classify electroencephalogram signals for cancer diagnosis and prognosis. The system combines
using DeepLIFT to provide feature importance and support the morphological and molecular information from histology
predictions’ confidence. According to the authors, DeepLIFT imaging and genomic features, respectively, and enables inter-
results suggested the network had learned relevant features pretability by applying Grad-CAM and Integrated Gradients.
closely aligned with results from the literature. Qiu et al. [255] Elmarakeby et al. [261] introduced a deep-learning model
proposed a learning framework to classify clinical information trained to predict cancer stages in patients diagnosed with
from individuals into different cognitive levels for supporting prostate cancer based on molecular data from their genomic
neurologists in detecting Alzheimer’s. DeepLIFT was then profiles. DeepLIFT evaluated the importance of specific genes
applied to assess the contribution of imaging and non-imaging in the model’s prediction and attributed high scores to genes
features to the diagnoses. known as prostate cancer previously related to metastatic
The COVID-19 pandemic posed a critical and urgent threat disease drivers, thus inspiring new hypotheses for cancer
to global health, thus motivating remarkable research efforts studies.
in medicine and other areas, including Machine Learning. Genetics is a research- and technology-intensive area
Zoabi et al. [256] trained a model to predict positive SARS- that has demanded more machine-learning solutions for
CoV-2 infections and applied beeswarm and summary plots making predictions in genomics research. According to
from SHAP to identify features impacting the model’s Novakovsky et al. [49], explanatory elements promoting
predictions. [257] also used SHAP to identify important insights into genetic processes can be more significant than
features of long COVID cases from electronic records in the the predictions themselves for genomics researchers. CRISPR-
USA. Cas9 is a cutting-edge tool in genetic engineering that enables
Hu et al. [204] designed a deep-learning model to a wide range of genome editing in different organisms.
classify COVID-19 infections from computed tomography Wang et al. [263] used TreeSHAP and DeepSHAP to evaluate
images, relying on Integrated Gradients to provide lung the influence of the position-dependent nucleotide features
lesion localization. Using chest X-rays, Brunese et al. [258] and improve a Recurrent Neural Network (RNN) model to
developed a deep-learning model to detect COVID-19. Grad- predict gene activity for CRISPR-Cas9 design.
CAM was applied for inputs classified as positive for Bar et al. [264] trained Gradient-boosted Decision Trees
highlighting the symptomatic areas related to the disease, with the human serum metabolome to predict metabolite levels
thus reducing the diagnosis time. A similar setup for and measure biomarker agents of different diseases. They
explaining COVID-19 detection from X-rays was proposed used TreeSHAP to find associations between representative
in [259]. Oh et al. [260] developed a patch-based CNN genomes and metabolite levels and discovered diet and
microbiome increased the models’ predictive power. The effect Regarding industrial applications, Hong et al. [270]
of feature values on predictions was modeled following a used force and decision plots from SHAP to analyze
directional mean absolute importance values setup, which the results of a model based on deep CNN and LSTM
relates the SHAP values from TreeSHAP with the sign of a networks applied to sensor data for predicting turbofan-type
Spearman correlation between target and features. aircraft engine maintenance. Brito et al. [271] proposed an
Avsec et al. [265] constructed a deep learning architecture unsupervised approach for detecting and classifying faults in
based on convolutional layers and self-attention mechanisms rotating machinery and applied SHAP for feature rankings
to predict gene sequences with high expression levels from in anomaly diagnosis. Black-box models have been used in
large-scale sequencing data in evolutionary studies. They the automotive industry for enabling vehicles to perceive
computed gradient scores [203] to assess the contributions of the environment and make driving decisions with less or
different gene expressions and understand the gene sequences no human intervention. In this context, transparency is
impacting the model predictions. Buergel et al. [266] trained critical for accepting autonomous vehicles on commercial
a Deep Neural Network (DNN) as a metabolomic model scales. Omeiza et al. [51] surveyed the XAI applications and
using a large dataset with 168 metabolic markers as input for challenges of autonomous driving and Zablocki et al. [52]
learning disease-specific states and predicting multi-disease focused their study on XAI methods for vision-based self-
risk for 24 conditions, including neurological disorders and driving deep learning models. We refer to [53] for further
cancer. DeepSHAP was initially applied globally to detect the details on the applications of XAI methods in modern
metabolites that most affected disease risk, considering all industries.
investigated diseases. Global SHAP values were visualized XAI has also been applied in time series analysis.
through heat maps and circular charts. Next, they applied Xu et al. [272] used SHAP in an interactive system for
DeepSHAP locally to attribute risk profiles for individual analyzing the relationship between input feature importance
predictions, visually analyzing them by projecting the entire and the output of multidimensional time-series forecasting
set of SHAP values with UMAP [150]. models. Parsa et al. [273] modeled real-time data from
Chklovski et al. [267] used TreeSHAP to highlight the Chicago metropolitan expressways to detect the occurrence of
contribution of specific genomic features and pathways to traffic accidents and applied SHAP and dependency plots to
predictions of a model trained on metagenomic data. For analyze the impact of input features (e.g., speed) for accident
further discussions, Novakovsky et al. [49] conducted a recent detection.
literature review focusing on sequence-to-activity models and In the chemistry domain, Sanchez-Lengeling et al.
the emerging applications of XAI used in genetics research [274] employed graph learning explainability to tackle the
to investigate spurious correlations and complex interaction quantitative structure-odor relationship (QSOR) problem. The
relationships between features. challenge was to understand which molecule’s substructure
In language modeling, hate speech is a cultural threat was responsible for the specific scent of that material (e.g.,
resulting from the increase in online iterations. Despite the fruity, weedy, medicinal). Preuer et al. [275] trained a Deep
proposals of hate speech detection models, more research on Graph CNN model to classify drugs into toxic and non-toxic;
their interpretability must be developd. Mathew et al. [241] they derived explanations from Integrated Gradients to detect
used LIME and attention methods to detect significant the molecular substructure causing the prediction, arguing
tokens related to hate speech. The authors verified models chemists can design methods to modify the responsible
with high performance in hate speech classification do not elements and thus avoid molecule toxicity.
perform well on explainability metrics such as faithfulness McCloskey et al. [276] used a Message-Passing Graph
and plausibility [240] and introduced a benchmark dataset Neural Network to study the binding properties between
covering multiple aspects of hate speech detection. molecules and proteins and then relied on Integrated Gradients
Pratt et al. [268] combined open-vocabulary models to explain which parts of the complex molecule-protein
with large language models (LLMs) to generate sentences scheme were causing the chemical bond to happen. The
describing the characteristics of images. They computed explanations enabled them to discover spurious binding
Shapley values to understand the importance of different image correlations in predictions despite that network achieving
regions highlighted in the descriptions and used heat maps to perfect classification accuracy. Schwaller et al. [277] applied
visualize the results. Sarzynska-Wawer et al. [269] used LIME Transformers to learn chemical reaction mechanisms based
to understand the reasons behind predictions of a language on the grammar of organic chemical interactions. The model
model applied in psychiatry for diagnosing schizophrenia- input is the stream of tokens concerning atoms in the chemical
related symptoms. The authors focused the explainability task chain of the molecules involved in the reaction. They explained
on misclassified patients, with LIME results identifying types the complex atom mapping between reactants and products
of words as indicative of thought disorder and revealing their by visualizing the relationship learned by the Transformer
language model was sensitive to context and word meanings. attention heads using the tool provided by Vig [179].
For a comprehensive overview of explainability within the Yang et al. [278] trained learning models on chem-
language processing domain, we refer to [55]. ical descriptors to predict gas permeability and design
high-performance polymer membranes. SHAP extracted TABLE 3. Summary of the XAI applications presented in Section IX,
according to their research domains.
the contributions of the different chemical components
linked to permeability and selectivity and, according to the
authors, it identified impacting molecular substructures, thus
encouraging future chemical and polymer research toward
taking advantage of explainability. Jiang et al. [279] applied
SHAP to explore the importance of molecular descriptors in
drug discovery tasks. Jiménez-Luna et al. [280] addressed
the technical challenges that XAI approaches faced when
applied in drug discovery supported by machine learning,
highlighting the need for a collaborative effort among deep-
learning developers, cheminformatics experts, chemists, and
other domain specialists for promoting models’ reliability with
XAI in the chemistry area.
Artificial Intelligence can also assist mathematicians
in discovering new theorems and proposing solutions for
long-standing open deductions. Davies et al. [281] proposed
an intriguing learning-based framework to recognize potential post-hoc explainers, as well as visualization tools for
patterns and relationships in pure mathematics problems. feature importance analysis. Captum [284] is a PyTorch
When the model finds some relationship, it applies Integrated library focused on gradient and perturbation-based attribution
Gradients to explain its nature, thus enabling mathematicians’ methods for explaining Neural Networks and also provides
intuition in proposing new conjectures. a visualization tool and sensitivity-based evaluation metrics.
A considerable number of recent papers from diverse Similarly, the iNNvestigate library [285] provides several
research areas have tied XAI methods to their machine- gradient and LRP-based methods for explaining Neural
learning studies. Most of them only cited explainability as Network architectures. AIX360 [286], [287] includes eight
a future direction to face the transparency challenges of using local and global explainers, evaluation metrics, and demon-
potent black boxes. However, several studies published in stration tutorials. Alibi Explain [288] is a Python library
high-impact venues improved their results with the support of that implements transparent and black-box models and nine
XAI approaches, as demonstrated in this section. We highlight explanation methods, including counterfactuals and bias
the prevalence of gradient-based methods in applications detection, for generating local and global explanations.
handling image data, since most of those methods are designed OmniXAI [289] is a comprehensive library that pro-
for architectures based on Neural Networks, which are almost vides explanations through a wide range of specific and
standard in image modeling applications. Moreover, multiple model-agnostic XAI methods and visualization tools for
studies using XAI can be found in medicine and genetics, various types of models (e.g., Scikit-learn, PyTorch, and
two areas with intensive research efforts that have been TensorFlow implementations) and data. In addition to
increasingly open to machine-learning solutions, thus proving those valuable XAI frameworks, the previously discussed
the practical value XAI can add to cutting-edge research. OpenXAI [189] and Quantus [244] packages are devoted to
Table 3 summarizes the XAI applications presented in this providing multiple evaluation metrics for XAI methods and
section organized according to the research area in which they explanations.
were employed. Bhatt et al. [90] outlined how several organizations have
applied XAI strategies to their workflows, highlighting which
explainability methods work best in practice. According to
A. PACKAGES AND LIBRARIES FOR XAI APPLICATION the authors, local explainability is typically the most relevant
Despite many papers promising explainability for tackling form of model transparency for end users. However, they
the opacity of black-box models, XAI developers have concluded most XAI advances are far from end users due
made significant efforts to implement explainability methods, to the limitations of current approaches for generating direct
making them available through software packages and information to those users, with machine learning engineers
libraries. LIME [97] and SHAP [109] are openly available being the primary users of XAI implementations most for
libraries providing methods and tools related to their base sanity-checking procedures during development processes.
approaches (see Sections VIII-H and VIII-I). Other initiatives Despite the diversity of successful application cases, there
include explainers covering different XAI approaches. are significant opportunities for improvements in future XAI
IML [282] was one of the first XAI packages. It is research, as discussed in the next section.
an R toolkit focused on classical implementations of
some well-known global and local model-agnostic methods. X. DISCUSSIONS AND FUTURE OPPORTUNITIES
InterpretML [283] is a Python package that provides In this study, we conducted qualitative comparisons, providing
interpretability through a set of transparent models and a comprehensive view of the strengths and limitations
of XAI approaches. Interested readers can find extensive involving automated systems and humans when no reasonable
studies on quantitative comparisons among XAI methods in explanations are available [218]. Therefore, research and
Amparore et al. [25], Krishna et al. [190], and Tan et development of comprehensive XAI techniques are essential
al. [33]. Regardless of the considerable efforts devoted to for understanding the decisions made by Machine Learning
this study, examining all relevant elements, techniques, and applications, offering data scientists and users the ability to
publications involved in the XAI environment would be summarize views on information hidden by the complex and
unfeasible. Therefore, other surveys and reviews have been intricate parametric spaces of modern learning models.
indicated for clarifying some specific concepts beyond the Bhatt et al. [90] detected a gap between explainability
scope of this text. and the goals of transparency, since most of the current XAI
As addressed elsewhere, the number of decisions made with deployments primarily serve developers, who internally use
the support of intelligent systems has grown and new proposals explainability to debug models, rather than external end users
have continuously emerged and been adopted. Multiple affected by Machine Learning, who are intuitive consumers
domains in which learning algorithms are inserted can poten- of explanations. Prediction explanation is currently one of
tially influence the way society interacts, challenging new the leading research lines in XAI. Explaining a prediction is
tools to mitigating possible negative consequences. Although particularly interesting because the model discovers patterns
the scientific community devoted to Machine Learning has that can be even more meaningful than the predictive
successfully improved the predictive performance of models, performance [14]. However, the performance metrics typically
the trade-off between precision and transparency still must be used in Machine Learning cannot appropriately verify learned
adjusted. High precision indicates high rates of true-positive patterns. Such a lack of assurance introduces a sensitive issue
predictions, hence, low rates of incorrect decisions. However, regarding the broad, sometimes thoughtless, use of complex
ignoring the logical processes that generate decisions is non-linear models for decision-making in domains ranging
unacceptable [12]. from science to industry [106].
According to Lundberg and Lee [109], the best explanation Lundberg et al. [12] argued only providing information
for a simple model is the model itself, for it represents the about which variables are more important for the learning
learned space perfectly and is easy to understand. However, model does not imply uncovering causal relationships, since
it is impossible to use a trained model based on complex importance values do not represent a complete scenario for
architectures, such as Random Forests and DNNs, to explain explaining a learning model. In other words, explaining a
itself due to difficulties in interpreting their decision structures. prediction simply by providing unique aspects within today’s
Non-linear learning models create mapping spaces that sophisticated learning systems is only part of the promotion
‘‘carve’’ space segments for different data classes, and such of interpretability. Nevertheless, understanding the features
regions can be fully connected, which is challenging to that influence a decision is information that analysts can use
unravel [101]. Interpretability arises from the model design. to formulate explanations of the right reasons or biases that
However, when a model cannot be directly interpreted, guide a model’s predictions due to inferring the logic behind
explainability methods must be applied to transforming complex black boxes with no support of an XAI method is
obscure elements into interpretable information. Multiple challenging.
aspects that can affect the ability of XAI tools to disclose Instability is a critical matter in XAI because it reflects
the logic of complex models must be considered. one of the most damaging factors related to the integrity
We remind the reader not all learning-based systems require of an explanation provider, namely, difficulty in promoting
interpretability. Further explanation is not necessary in case trust. As an example, if an application supported by decision
of no significant consequences for an algorithm results or algorithms provides users with inconsistent explanations
if the problem has already been sufficiently tested in real for similar (or same) data instances, those explanations
situations [21]. However, since complex learning models cannot be considered reliable [25]. State-of-the-art XAI
have been increasingly adopted for making critical decisions techniques suffer from instability due to the randomness of
in critical contexts, such as precision medicine, algorithmic the approximation strategies used to circumvent the need
transparency has been clearly demanded due to the reluctance for computational resources. Another instability factor is the
of humans to employ techniques that are not interpretable, extrapolation behavior derived from impossible or improbable
treatable, or reliable, thus limiting the applicability of Machine scenarios that can occur during the inclusion of correlated
Learning [290]. Unjustifiable decisions, or decisions that do features in the marginalization strategy. Moreover, some XAI
not enable detailed explanations about the underlying logic, methods are based on heuristic or empirical solutions, with
can lead to dangerous situations or even profound impacts on weak justifications for the choice of important parameters.
social dynamics [2], [291], [292]. Most XAI methods are non-robust and their adoption for
The ‘‘right to information,’’ defined by the European GDPR understanding models devoted to safety-critical applications
or the Californian CCPA (see Section I-A), constrains personal can be risky [128]. Rudin [293] argued interpretable models
data usage and demonstrates the inherent lack of ethics are more appropriate for applications in which high-stakes
many people feel regarding the decision-making process decision-making is more important than attempts to explaining
the outputs of black-box models through limited XAI methods which model explanations can unintentionally reveal sensitive
that do not offer all theoretical guarantees. However, due details about users’ data. In this context, Aïvodji et al. [161]
to performance and computational restrictions, transparent addressed the trade-off between privacy and explainability and
models and exact axiomatic XAI solutions cannot be applied Nguyen et al. [125] surveyed the recent findings in
in several real-world circumstances. In some machine learning privacy-preserving mechanisms within model explanations
tasks, it is not reasonable to sacrifice performance by to avoid privacy leaks or deciphering attacks by malicious
using transparent solutions, thus demanding complex high- entities. Additionally, we restate the need for more careful
performance models. In such cases, XAI approaches represent evaluations. Future research on XAI must carefully create and
a promising direction for debugging or identifying insights systematically apply qualitative and quantitative assessment
that warrant deeper investigation, enabling users to build methodologies and metrics for robust explanations.
explanatory elements [114]. Explainability goes beyond the simple belief in raw statisti-
A comprehensive XAI tool capable of producing consistent, cal measurements within the Machine Learning environment,
stable, and comprehensive information about different aspects thus benefiting everyone involved (and affected) through
of a learning procedure can provide a helpful explainability applications relying on artificial intelligence-supported
scenario. If carefully developed and applied, XAI adds a decision-making. Machine Learning is revolutionary and can
new perspective to the vast horizon of Machine Learning, be applied as a powerful element to shift paradigms, leading
which can enrich future debates on whether computer devices to future changes in society. However, not everything related
can genuinely exhibit intelligent behavior [106]. There is a to technological advances derived from Artificial Intelligence
long avenue of research in XAI related to developing stable is perfect, as discussed by several studies addressing the
and consistent explainability solutions based on evaluated weaknesses of high-end learning algorithms referenced in this
and validated explanations for accomplishing such aims. review. Some of those drawbacks go beyond the problems
A modern XAI solution would include more than only related to challenges in understanding such intricate models,
prediction explanations or model inspection; it should be a and, as discussed here, the ability of Machine Learning to
comprehensible tool that includes explanations from different adapt and generalize may sometimes be overestimated.
XAI approaches enhanced by information visualization and The reality appears much more evident in hindsight than
human interaction mechanisms – in this sense, we cite the in foresight [295]. Therefore, new learning algorithms and
impressive proposal of Lundberg et al. [12]. methodologies that are more robust and meet ethical and legal
However, XAI researchers and applicants face a significant requirements can emerge, taking advantage of the support
open question that hampers the development of comprehensive of XAI techniques. Users who choose to merely believe in
XAI tools. The current regulations have established the right black-box model outputs without contesting the motivation
to explanations, but only in high-level conceptual terms. behind predictions may eventually be fooled by randomness
No law has defined system requirements for XAI, baselines, and outcome bias [296].
guidelines, assessment or validation standards, or presentation
formats for explanations, leading to a lack of common XI. CONCLUSION
objectives and formalization in current XAI approaches. XAI Artificial Intelligence is not the future; it is the present. Modern
community has established all such needs by questioning high-performance learning-based applications have become
machine learning opacity. a near-ubiquitous reality; however, several institutions still
Technology evolves rapidly and the needs of each use classical models (e.g., linear regression and rule-based
target audience can also change quickly. Moreover, the ones) due to the need for more transparent solutions [293].
target audiences of XAI tools can be very heterogeneous. An explainable artificial intelligence application provides
In this sense, we recommend designing processes based on detailed elements clarifying the models’ decision-making
multidisciplinary teams with data scientists, psychologists, procedures, thus facilitating the understanding of the
domain experts, and law specialists as an important future contributions of features to predictions and their impact
direction for XAI evolution toward the creation of XAI on predictive performance, making such models’ rational
tools tailored to the target audience requiring explanations processes more transparent and verifiable [2]. Black boxes
and observing privacy, data governance, accountability, and that only output decisions with no further explanations of
fairness principles. For a valuable discussion connecting the the underlying mechanisms are difficult to trust and do not
research gaps between XAI and fairness, we refer to [294]. supply contextual directions to support their outcomes when
Providing users with realistic and highly informative expla- confronted by users [12].
nations is a desirable way to satisfy criteria such as robustness In this sense, knowing what goes in and out of automated
and faithfulness. However, more informative explanations decision-making systems with no comprehension of what
might leak non-trivial information about the underlying model, occurred behind the scenes no longer satisfies the infor-
which could be explored in model extraction attacks and mation access needs, which is against recent ethical and
raise privacy issues [161]. Protecting data privacy while legal directives. XAI can generate human-comprehensible
enhancing transparency is of paramount research importance explanations for machine decisions, helping detect hidden
in XAI. The literature does not fully explore the extent to biases. Introducing such clarifications might lead to the
TABLE 4. List of abbreviations and acronyms. XAI research. Properly validated explanations can improve
the confidence levels of artificial intelligence-powered
applications, promoting sustainable computer decisions
according to social responsibility, reliability, and security
needs.
APPENDIX
ABBREVIATIONS AND ACRONYMS
The abbreviations and acronyms used in this review are
summarized in Table 4.
ACKNOWLEDGMENT
The authors acknowledge Angela C. P. Giampedro for her
valuable comments. Figures 3 and 6 were created with
BioRender.com. The opinions, hypotheses, and conclusions
or recommendations expressed in this material are the authors’
responsibility and do not necessarily reflect the views of the
funding agencies.
REFERENCES
[1] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach,
3rd ed. Upper Saddle River, NJ, USA: Pearson, 2010.
[2] A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik,
A. Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila,
and F. Herrera, ‘‘Explainable artificial intelligence (XAI): Concepts,
taxonomies, opportunities and challenges toward responsible AI,’’ Inf.
Fusion, vol. 58, pp. 82–115, Jun. 2020.
[3] D. M. West, The Future of Work: Robots, AI, and Automation. Washington,
DC, USA: Brookings Institution Press, 2018.
[4] A. Adadi and M. Berrada, ‘‘Peeking inside the black-box: A survey
on explainable artificial intelligence (XAI),’’ IEEE Access, vol. 6,
pp. 52138–52160, 2018.
[5] W. J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, and B. Yu,
‘‘Definitions, methods, and applications in interpretable machine learning,’’
Proc. Nat. Acad. Sci. USA, vol. 116, no. 44, pp. 22071–22080, Oct. 2019.
[6] V. Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, and G. Kasneci,
‘‘Deep neural networks and tabular data: A survey,’’ IEEE Trans. Neural
Netw. Learn. Syst., vol. 35, no. 6, pp. 7499–7519, Jun. 2024.
[7] F. K. Dosilovic, M. Brcic, and N. Hlupic, ‘‘Explainable artificial
intelligence: A survey,’’ in Proc. 41st Int. Conv. Inf. Commun.
Technol., Electron. Microelectron. (MIPRO), Opatija, Croatia, May 2018,
pp. 0210–0215.
[8] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521,
no. 7553, pp. 436–444, 7553.
[9] Q.-S. Zhang and S.-C. Zhu, ‘‘Visual interpretability for deep learning:
A survey,’’ Frontiers Inf. Technol. Electron. Eng., vol. 19, no. 1, pp. 27–39,
Jan. 2018.
[10] S. Chakraborty, R. Tomsett, R. Raghavendra, D. Harborne, M. Alzantot,
development of procedures or algorithms to identify and F. Cerutti, M. Srivastava, A. Preece, S. Julier, R. M. Rao, T. D. Kelley,
D. Braines, M. Sensoy, C. J. Willis, and P. Gurram, ‘‘Interpretability of
handle biases, which is valuable to organizations potentially deep learning models: A survey of results,’’ in Proc. IEEE SmartWorld,
responsible for unfair decisions. Ubiquitous Intell. Comput., Adv. Trusted Comput., Scalable Comput.
This concise literature review showed explainability for Commun., Cloud Big Data Comput., Internet People Smart City Innov.
(SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), San Francisco,
machine learning is a fruitful area, with multiple strategies CA, USA, Aug. 2017, pp. 1–6.
already in use and others under development toward filling [11] S. Ullman, ‘‘Using neuroscience to develop artificial intelligence,’’ Science,
the current gaps in bringing transparency to opaque systems. vol. 363, no. 6428, pp. 692–693, Feb. 2019.
We held discussions on the current XAI literature, contributing [12] S. M. Lundberg, B. Nair, M. S. Vavilala, M. Horibe, M. J. Eisses,
T. Adams, D. E. Liston, D. K.-W. Low, S.-F. Newman, J. Kim, and
from theoretical characterizations to critical investigations S.-I. Lee, ‘‘Explainable machine-learning predictions for the prevention
of XAI methods and applications. XAI explanations reach of hypoxaemia during surgery,’’ Nature Biomed. Eng., vol. 2, no. 10,
domains where transparent models cannot be applied. How- pp. 749–760, Oct. 2018.
[13] Z. C. Lipton, ‘‘The mythos of model interpretability: In machine learning,
ever, a black-box explainer must also be explainable and robust. the concept of interpretability is both important and slippery,’’ Queue,
Despite the multiple XAI approaches proposed, the literature vol. 16, no. 3, pp. 31–57, Jun. 2018.
reports no gold-standard XAI system and, as we highlighted, [14] S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair,
R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee, ‘‘From local explanations
design considerations and proper assessments are critical to global understanding with explainable AI for trees,’’ Nature Mach.
concerns to be more methodologically addressed in future Intell., vol. 2, no. 1, pp. 56–67, Jan. 2020.
[15] O. Maier and H. Handels, ‘‘Predicting stroke lesion and clinical [38] G. Vilone and L. Longo, ‘‘Notions of explainability and evaluation
outcome with random forests,’’ in Brainlesion: Glioma, Multiple Sclerosis, approaches for explainable artificial intelligence,’’ Inf. Fusion, vol. 76,
Stroke and Traumatic Brain Injuries. Athens, Greece: Springer, 2016, pp. 89–106, Dec. 2021.
pp. 219–230. [39] F. Vitali, ‘‘A survey on methods and metrics for the assessment of
[16] J. Amann, D. Vetter, S. N. Blomberg, H. C. Christensen, M. Coffee, explainability under the proposed AI act,’’ in Legal Knowledge and
S. Gerke, T. K. Gilbert, T. Hagendorff, S. Holm, M. Livne, A. Spezzatti, Information Systems, vol. 346. Amsterdam, The Netherlands: IOS Press,
I. Strümke, R. V. Zicari, and V. I. Madai, ‘‘To explain or not to explain— 2022, p. 235.
Artificial intelligence explainability in clinical decision support systems,’’ [40] C. Molnar, Interpretable Machine Learning. Durham, NC, USA: Lulu
PLOS Digit. Health, vol. 1, no. 2, Feb. 2022, Art. no. e0000016. Press, 2019.
[17] L. Breiman, ‘‘Random forests,’’ Mach. Learn., vol. 45, pp. 5–32, [41] T. Speith, ‘‘A review of taxonomies of explainable artificial intelligence
Oct. 2001. (XAI) methods,’’ in Proc. ACM Conf. Fairness, Accountability, Trans-
[18] J. Burrell, ‘‘How the machine ‘thinks’: Understanding opacity in machine parency. New York, NY, USA: Association for Computing Machinery,
learning algorithms,’’ Big Data Soc., vol. 3, no. 1, pp. 1–12, 2016. Jun. 2022, pp. 2239–2250.
[19] W. Samek, T. Wiegand, and K. Müller, ‘‘Explainable artificial intelligence: [42] E. Tjoa and C. Guan, ‘‘A survey on explainable artificial intelligence
Understanding, visualizing and interpreting deep learning models,’’ 2017, (XAI): Toward medical XAI,’’ IEEE Trans. Neural Netw. Learn. Syst.,
arXiv:1708.08296. vol. 32, no. 11, pp. 4793–4813, Nov. 2021.
[20] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad, [43] J. Petch, S. Di, and W. Nelson, ‘‘Opening the black box: The promise
‘‘Intelligible models for HealthCare: Predicting pneumonia risk and and limitations of explainable machine learning in cardiology,’’ Can. J.
hospital 30-day readmission,’’ in Proc. 21st ACM SIGKDD Int. Conf. Cardiol., vol. 38, no. 2, pp. 204–213, Feb. 2022.
Knowl. Discovery Data Mining, Aug. 2015, pp. 1721–1730. [44] Y. Zhang, Y. Weng, and J. Lund, ‘‘Applications of explainable artificial
intelligence in diagnosis and surgery,’’ Diagnostics, vol. 12, no. 2, p. 237,
[21] F. Doshi-Velez and B. Kim, ‘‘Towards a rigorous science of interpretable
Jan. 2022.
machine learning,’’ 2017, arXiv:1702.08608.
[45] B. H. M. van der Velden, H. J. Kuijf, K. G. A. Gilhuijs, and
[22] T. Miller, ‘‘Explanation in artificial intelligence: Insights from the social M. A. Viergever, ‘‘Explainable artificial intelligence (XAI) in deep
sciences,’’ Artif. Intell., vol. 267, pp. 1–38, Feb. 2019. learning-based medical image analysis,’’ Med. Image Anal., vol. 79,
[23] D. Gunning and D. Aha, ‘‘DARPA’s explainable artificial intelligence Jul. 2022, Art. no. 102470.
(XAI) program,’’ AI Mag., vol. 40, no. 2, pp. 44–58, Jun. 2019. [46] M. Reyes, R. Meier, S. Pereira, C. A. Silva, F.-M. Dahlweid,
[24] EU Regulation. (Apr. 2023). 2016/679 of the European Parliament and of H. V. Tengg-Kobligk, R. M. Summers, and R. Wiest, ‘‘On the interpretabil-
the council of 27 April 2016 on the General Data Protection Regulation. ity of artificial intelligence in radiology: Challenges and opportunities,’’
[Online]. Available: http://data.europa.eu/eli/reg/2016/679/oj Radiol., Artif. Intell., vol. 2, no. 3, May 2020, Art. no. e190043.
[25] E. Amparore, A. Perotti, and P. Bajardi, ‘‘To trust or not to trust an [47] G. Yang, Q. Ye, and J. Xia, ‘‘Unbox the black-box for the medical
explanation: Using LEAF to evaluate local linear XAI methods,’’ PeerJ explainable AI via multi-modal and multi-centre data fusion: A mini-
Comput. Sci., vol. 7, p. e479, Apr. 2021. review, two showcases and beyond,’’ Inf. Fusion, vol. 77, pp. 29–52,
[26] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and Jan. 2022.
D. Pedreschi, ‘‘A survey of methods for explaining black box models,’’ [48] H. W. Loh, C. P. Ooi, S. Seoni, P. D. Barua, F. Molinari, and U. R. Acharya,
ACM Comput. Surveys, vol. 51, no. 5, pp. 1–42, Sep. 2019. ‘‘Application of explainable artificial intelligence for healthcare: A
[27] FDA. (Feb. 2024). Proposed Regulatory Framework for Modifications to systematic review of the last decade (2011–2022),’’ Comput. Methods
Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Programs Biomed., vol. 226, Nov. 2022, Art. no. 107161.
Medical Device (SaMD)—Discussion Paper and Request for Feedback. [49] G. Novakovsky, N. Dexter, M. W. Libbrecht, W. W. Wasserman, and
[Online]. Available: https://www.regulations.gov/document?D=FDA- S. Mostafavi, ‘‘Obtaining genetics insights from deep learning via
2019-N-1185-0001 explainable artificial intelligence,’’ Nature Rev. Genet., vol. 24, no. 2,
[28] L. Maier-Hein, M. Eisenmann, D. Sarikaya, K. März, T. Collins, pp. 125–137, Feb. 2023.
A. Malpani, J. Fallert, H. Feussner, S. Giannarou, and P. Mascagni, [50] L. Ruff, J. R. Kauffmann, R. A. Vandermeulen, G. Montavon, W. Samek,
‘‘Surgical data science—From concepts toward clinical translation,’’ Med. M. Kloft, T. G. Dietterich, and K.-R. Müller, ‘‘A unifying review of deep
Image Anal., vol. 76, Feb. 2022, Art. no. 102306. and shallow anomaly detection,’’ Proc. IEEE, vol. 109, no. 5, pp. 756–795,
[29] WHO. (2021). Ethics and Governance of Artificial Intelligence May 2021.
for Health: WHO Guidance. World Health Organization, Geneva, [51] D. Omeiza, H. Webb, M. Jirotka, and L. Kunze, ‘‘Explanations in
Switzerland. Accessed: Feb. 2024. [Online]. Available: https://www.who. autonomous driving: A survey,’’ IEEE Trans. Intell. Transp. Syst., vol. 23,
int/publications/i/item/9789240029200 no. 8, pp. 10142–10162, Aug. 2022.
[30] State of California. (2021). California Consumer Privacy Act (CCPA). [52] É. Zablocki, H. Ben-Younes, P. Pérez, and M. Cord, ‘‘Explainability of
State of California, Department of Justice, Office of the Attorney General. deep vision-based autonomous driving systems: Review and challenges,’’
Accessed: Jul. 2023. [Online]. Available: https://oag.ca.gov/privacy/ccpa Int. J. Comput. Vis., vol. 130, no. 10, pp. 2425–2452, Oct. 2022.
[53] I. Ahmed, G. Jeon, and F. Piccialli, ‘‘From artificial intelligence to
[31] L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, and L. Kagal,
explainable artificial intelligence in Industry 4.0: A survey on what, how,
‘‘Explaining explanations: An overview of interpretability of machine
and where,’’ IEEE Trans. Ind. Informat., vol. 18, no. 8, pp. 5031–5042,
learning,’’ in Proc. IEEE 5th Int. Conf. Data Sci. Adv. Analytics (DSAA),
Aug. 2022.
Turin, Italy, Oct. 2018, pp. 80–89.
[54] X. Zhong, B. Gallagher, S. Liu, B. Kailkhura, A. Hiszpanski, and
[32] P. Linardatos, V. Papastefanopoulos, and S. Kotsiantis, ‘‘Explainable AI: T. Y.-J. Han, ‘‘Explainable machine learning in materials science,’’ Npj
A review of machine learning interpretability methods,’’ Entropy, vol. 23, Comput. Mater., vol. 8, no. 1, p. 204, Sep. 2022.
no. 1, p. 18, Dec. 2020.
[55] M. Danilevsky, K. Qian, R. Aharonov, Y. Katsis, B. Kawas, and P. Sen,
[33] S. Tan, G. Hooker, P. Koch, A. Gordo, and R. Caruana, ‘‘Considerations ‘‘A survey of the state of explainable AI for natural language processing,’’
when learning additive explanations for black-box models,’’ Mach. Learn., 2020, arXiv:2010.00711.
vol. 112, no. 9, pp. 3333–3359, Sep. 2023. [56] D. Poole, A. Mackworth, and R. Goebel, Computational Intelligence: A
[34] V. Belle and I. Papantonis, ‘‘Principles and practice of explainable machine Logical Approach. Oxford, U.K.: Oxford Univ. Press, 1998.
learning,’’ Frontiers Big Data, vol. 4, p. 39, Jul. 2021. [57] M. Garnelo and M. Shanahan, ‘‘Reconciling deep learning with symbolic
[35] N. Burkart and M. F. Huber, ‘‘A survey on the explainability of supervised artificial intelligence: Representing objects and relations,’’ Current
machine learning,’’ J. Artif. Intell. Res., vol. 70, pp. 245–317, Jan. 2021. Opinion Behav. Sci., vol. 29, pp. 17–23, Oct. 2019.
[36] P. Hase and M. Bansal, ‘‘Evaluating explainable AI: Which algorithmic [58] S. Harnad, ‘‘The symbol grounding problem,’’ Phys. D, Nonlinear
explanations help users predict model behavior?’’ in Proc. 58th Annu. Phenomena, vol. 42, nos. 1–3, pp. 335–346, Jun. 1990.
Meeting Assoc. Comput. Linguistics, 2020, pp. 5540–5552. [59] A. L. Samuel, ‘‘Some studies in machine learning using the game of
[37] S. Mohseni, N. Zarei, and E. D. Ragan, ‘‘A multidisciplinary survey and checkers,’’ IBM J. Res. Develop., vol. 3, no. 3, pp. 210–229, Jul. 1959.
framework for design and evaluation of explainable AI systems,’’ ACM [60] J. G. Carbonell, R. S. Michalski, and T. M. Mitchell, ‘‘An overview of
Trans. Interact. Intell. Syst., vol. 11, nos. 3–4, pp. 1–45, Dec. 2021. machine learning,’’ Mach. Learn., vol. 1, pp. 3–23, Jan. 1983.
[61] P. Bhavsar, I. Safro, N. Bouaynaya, R. Polikar, and D. Dera, ‘‘Chapter 12— [87] S. Passi and S. J. Jackson, ‘‘Trust in data science: Collaboration, translation,
Machine learning in transportation data analytics,’’ in Data Analytics for and accountability in corporate data science projects,’’ Proc. ACM Hum.-
Intelligent Transportation Systems. Amsterdam, The Netherlands: Elsevier, Comput. Interact., vol. 2, pp. 1–28, Nov. 2018.
2017, pp. 283–307. [88] D. Doran, S. Schulz, and T. R. Besold, ‘‘What does explainable AI really
[62] Asimov. (2016). The Neural Network Zoo—The Asimov Institute. mean? A new conceptualization of perspectives,’’ 2017, arXiv:1710.00794.
Accessed: Apr. 2023. [Online]. Available: https://www.asimovinstitute. [89] T. Lombrozo, ‘‘The structure and function of explanations,’’ Trends Cognit.
org/neural-network-zoo/ Sci., vol. 10, no. 10, pp. 464–470, Oct. 2006.
[63] T. Chen and C. Guestrin, ‘‘XGBoost: A scalable tree boosting system,’’ [90] U. Bhatt, A. Xiang, S. Sharma, A. Weller, A. Taly, Y. Jia, J. Ghosh,
in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining. R. Puri, J. M. F. Moura, and P. Eckersley, ‘‘Explainable machine learning
New York, NY, USA: Association for Computing Machinery, Aug. 2016, in deployment,’’ in Proc. Conf. Fairness, Accountability, Transparency.
pp. 785–794. New York, NY, USA: Association for Computing Machinery, Jan. 2020,
[64] B. Ghojogh and M. Crowley, ‘‘The theory behind overfitting, cross pp. 648–657.
validation, regularization, bagging, and boosting: Tutorial,’’ 2019, [91] M. Gleicher, ‘‘A framework for considering comprehensibility in
arXiv:1905.12787. modeling,’’ Big Data, vol. 4, no. 2, pp. 75–88, Jun. 2016.
[65] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, [92] J. Gareth, W. Daniela, H. Trevor, and T. Robert, An Introduction to
Ł. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ in Proc. Adv. Statistical Learning: With Applications in R, vol. 1. Heidelberg, Germany:
Neural Inf. Process. Syst., vol. 30, 2017, pp. 1–11. Spinger, 2017.
[66] O. Goldreich, ‘‘Computational complexity: A conceptual perspective,’’ [93] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, ‘‘A study of the
ACM Sigact News, vol. 39, pp. 35–39, Sep. 2008. behavior of several methods for balancing machine learning training data,’’
[67] S. Arora and B. Barak, Computational Complexity: A Modern Approach. ACM SIGKDD Explor. Newslett., vol. 6, no. 1, pp. 20–29, Jun. 2004.
Cambridge, U.K.: Cambridge Univ. Press, 2009. [94] S. Vashishth, S. Upadhyay, G. Singh Tomar, and M. Faruqui, ‘‘Attention
[68] E. Briscoe and J. Feldman, ‘‘Conceptual complexity and the bias/variance interpretability across NLP tasks,’’ 2019, arXiv:1909.11218.
tradeoff,’’ Cognition, vol. 118, no. 1, pp. 2–16, Jan. 2011. [95] S. Jain and B. C. Wallace, ‘‘Attention is not explanation,’’ 2019,
[69] S. Fortmann-Roe. (2012). Accurately Measuring Model arXiv:1902.10186.
Prediction Error. [Online]. Available: http://scott.fortmann-roe. [96] R. Munroe. (Jun. 2023). XKCD—A Webcomic of Romance, Sarcasm, Math,
com/docs/MeasuringError.html and Language. [Online]. Available: https://xkcd.com/1838/
[70] B. Neal, S. Mittal, A. Baratin, V. Tantia, M. Scicluna, S. Lacoste-Julien, [97] M. T. Ribeiro, S. Singh, and C. Guestrin, ‘‘‘Why should I trust you?’
and I. Mitliagkas, ‘‘A modern take on the bias-variance tradeoff in neural Explaining the predictions of any classifier,’’ in Proc. 22nd ACM SIGKDD
networks,’’ 2018, arXiv:1810.08591. Int. Conf. Knowl. Discovery Data Mining, San Francisco, CA, USA, 2016,
[71] T. Hastie, R. Tibshirani, and J. H. Friedman, The Elements of Statistical pp. 1135–1144.
Learning: Data Mining, Inference, and Prediction. Heidelberg, Germany: [98] P. E. Rauber, S. G. Fadel, A. X. Falcão, and A. C. Telea, ‘‘Visualizing the
Springer, 2009, vol. 2. hidden activity of artificial neural networks,’’ IEEE Trans. Vis. Comput.
[72] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, Graphics, vol. 23, no. 1, pp. 101–110, Jan. 2017.
MA, USA: MIT Press, 2016. [99] R. Yousefzadeh and D. P. O’Leary, ‘‘Interpreting neural networks using
[73] S. Fortmann-Roe. (2012). Understanding the Bias-Variance Trade- flip points,’’ 2019, arXiv:1903.08789.
off. Accessed: Apr. 2023. [Online]. Available: http://scott.fortmann- [100] S. Wiegreffe and Y. Pinter, ‘‘Attention is not not explanation,’’ 2019,
roe.com/docs/BiasVariance.html arXiv:1908.04626.
[74] S. Geman, E. Bienenstock, and R. Doursat, ‘‘Neural networks and the [101] H. Karimi, T. Derr, and J. Tang, ‘‘Characterizing the decision boundary of
bias/variance dilemma,’’ Neural Comput., vol. 4, no. 1, pp. 1–58, Jan. 1992. deep neural networks,’’ 2019, arXiv:1912.11460.
[75] N. Haim, G. Vardi, G. Yehudai, O. Shamir, and M. Irani, ‘‘Reconstructing [102] J. Ba and R. Caruana, ‘‘Do deep nets really need to be deep?’’ in Proc.
training data from trained neural networks,’’ 2022, arXiv:2206.07758. Adv. Neural Inf. Process. Syst., vol. 27, 2014, pp. 1–9.
[76] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, ‘‘Understanding [103] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow,
deep learning (still) requires rethinking generalization,’’ Commun. ACM, and R. Fergus, ‘‘Intriguing properties of neural networks,’’ 2013,
vol. 64, no. 3, pp. 107–115, Mar. 2021. arXiv:1312.6199.
[77] P. L. Bartlett, P. M. Long, G. Lugosi, and A. Tsigler, ‘‘Benign overfitting [104] I. J. Goodfellow, J. Shlens, and C. Szegedy, ‘‘Explaining and harnessing
in linear regression,’’ Proc. Nat. Acad. Sci. USA, vol. 117, no. 48, adversarial examples,’’ 2014, arXiv:1412.6572.
pp. 30063–30070, Dec. 2020. [105] J. Su, D. V. Vargas, and K. Sakurai, ‘‘One pixel attack for fooling deep
[78] K. Wang, V. Muthukumar, and C. Thrampoulidis, ‘‘Benign overfitting in neural networks,’’ IEEE Trans. Evol. Comput., vol. 23, no. 5, pp. 828–841,
multiclass classification: All roads lead to interpolation,’’ in Proc. Adv. Oct. 2019.
Neural Inf. Process. Syst., vol. 34, 2021, pp. 24164–24179. [106] S. Lapuschkin, S. Wäldchen, A. Binder, G. Montavon, W. Samek, and
[79] J. Jung, C. Concannon, R. Shroff, S. Goel, and D. G. Goldstein, ‘‘Simple K.-R. Müller, ‘‘Unmasking clever Hans predictors and assessing what
rules for complex decisions,’’ 2017, arXiv:1702.04690. machines really learn,’’ Nature Commun., vol. 10, no. 1, pp. 1–8,
[80] J. M. Alonso, C. Castiello, and C. Mencar, ‘‘Interpretability of fuzzy Mar. 2019.
systems: Current research trends and prospects,’’ in Springer Handbook of [107] G. Ras, M. van Gerven, and P. Haselager, ‘‘Explanation methods in deep
Computational Intelligence. London, U.K.: Springer, 2015, pp. 219–237. learning: Users, values, concerns and challenges,’’ in Explainable and
[81] J. H. Friedman and B. E. Popescu, ‘‘Predictive learning via rule ensembles,’’ Interpretable Models in Computer Vision and Machine Learning. London,
Ann. Appl. Statist., vol. 2, no. 3, pp. 916–954, Sep. 2008. U.K.: Springer, 2018, pp. 19–36.
[82] J. H. Friedman, ‘‘Greedy function approximation: A gradient boosting [108] F. Poursabzi-Sangdeh, D. G. Goldstein, J. M. Hofman, J. W. Wortman
machine,’’ Ann. Statist., vol. 29, no. 5, pp. 1189–1232, Oct. 2001. Vaughan, and H. Wallach, ‘‘Manipulating and measuring model
[83] A. Goldstein, A. Kapelner, J. Bleich, and E. Pitkin, ‘‘Peeking inside interpretability,’’ in Proc. CHI Conf. Human Factors Comput. Syst. New
the black box: Visualizing statistical learning with plots of individual York, NY, USA: Association for Computing Machinery, May 2021,
conditional expectation,’’ J. Comput. Graph. Statist., vol. 24, no. 1, pp. 1–52.
pp. 44–65, Jan. 2015. [109] S. M. Lundberg and S.-I. Lee, ‘‘A unified approach to interpreting model
[84] G. Casalicchio, C. Molnar, and B. Bischl, ‘‘Visualizing the feature predictions,’’ in Proc. 31st Int. Conf. Neural Inf. Process. Syst. Long Beach,
importance for black box models,’’ in Proc. Joint Eur. Conf. Mach. Learn. CA, USA: Curran Associates, 2017, pp. 4768–4777.
Knowl. Discovery Databases. London, U.K.: Springer, 2018, pp. 655–670. [110] M. Yang and B. Kim, ‘‘Benchmarking attribution methods with relative
[85] D. Kahneman, O. Sibony, and C. R. Sunstein, Noise: A Flaw in Human feature importance,’’ 2019, arXiv:1907.09701.
Judgment. New York, NY, USA: Little, Brown and Company, 2021. [111] R. Kass and T. Finin, ‘‘The need for user models in generating expert
[86] I. Kumar, C. Scheidegger, S. Venkatasubramanian, and S. Friedler, system explanations,’’ Int. J. Exp. Syst., vol. 1, no. 4, pp. 1–31, 1988.
‘‘Shapley residuals: Quantifying the limits of the Shapley value for [112] Y. Lou, R. Caruana, J. Gehrke, and G. Hooker, ‘‘Accurate intelligible
explanations,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 34, 2021, models with pairwise interactions,’’ in Proc. 19th ACM SIGKDD Int. Conf.
pp. 1–11. Knowl. Discovery Data Mining, Chicago, IL, USA, 2013, pp. 623–631.
[113] H. Chen, S. Lundberg, and S.-I. Lee, ‘‘Explaining models by propagating [138] E. C. Alexandrina, E. S. Ortigossa, E. S. Lui, J. A. S. Gonçalves,
Shapley values of local components,’’ in Explainable AI in Healthcare N. A. Correa, L. G. Nonato, and M. L. Aguiar, ‘‘Analysis and visualization
and Medicine. New York City, NY, USA: Springer, 2021, pp. 261–270. of multidimensional time series: Particulate matter (PM10) from São
[114] H. Chen, S. M. Lundberg, and S.-I. Lee, ‘‘Explaining a series of models by carlos-SP (Brazil),’’ Atmos. Pollut. Res., vol. 10, no. 4, pp. 1299–1311,
propagating Shapley values,’’ Nature Commun., vol. 13, no. 1, pp. 1–15, Jul. 2019.
Aug. 2022. [139] M. Kahng, P. Y. Andrews, A. Kalro, and D. H. Chau, ‘‘ActiVis: Visual
[115] M. Doumpos and C. Zopounidis, ‘‘Model combination for credit risk exploration of industry-scale deep neural network models,’’ IEEE Trans.
assessment: A stacked generalization approach,’’ Ann. Oper. Res., vol. 151, Vis. Comput. Graphics, vol. 24, no. 1, pp. 88–97, Jan. 2018.
no. 1, pp. 289–306, Feb. 2007. [140] J.-L. Wu, P.-C. Chang, C. Wang, and K.-C. Wang, ‘‘ATICVis: A visual
[116] S. P. Healey, W. B. Cohen, Z. Yang, C. K. Brewer, E. B. Brooks, analytics system for asymmetric transformer models interpretation and
N. Gorelick, A. J. Hernandez, C. Huang, M. J. Hughes, R. E. Kennedy, comparison,’’ Appl. Sci., vol. 13, no. 3, p. 1595, Jan. 2023.
T. R. Loveland, G. G. Moisen, T. A. Schroeder, S. V. Stehman, [141] F. Hohman, M. Kahng, R. Pienta, and D. H. Chau, ‘‘Visual analytics in
J. E. Vogelmann, C. E. Woodcock, L. Yang, and Z. Zhu, ‘‘Mapping forest deep learning: An interrogative survey for the next frontiers,’’ IEEE Trans.
change using stacked generalization: An ensemble approach,’’ Remote Vis. Comput. Graphics, vol. 25, no. 8, pp. 2674–2693, Aug. 2019.
Sens. Environ., vol. 204, pp. 717–728, Jan. 2018. [142] W. E. Marcílio-Jr, D. M. Eler, and F. Breve, ‘‘Model-agnostic interpretation
[117] K. Aas, M. Jullum, and A. Løland, ‘‘Explaining individual predictions by visualization of feature perturbations,’’ 2021, arXiv:2101.10502.
when features are dependent: More accurate approximations to Shapley [143] Q. Zhao and T. Hastie, ‘‘Causal interpretations of black-box models,’’
values,’’ Artif. Intell., vol. 298, Sep. 2021, Art. no. 103502. J. Bus. Econ. Statist., vol. 39, no. 1, pp. 272–281, Jan. 2021.
[118] G. Yeuk-Yin Chan, E. Bertini, L. Gustavo Nonato, B. Barr, and C. T. Silva, [144] P. Xenopoulos, G. Chan, H. Doraiswamy, L. G. Nonato, B. Barr, and
‘‘Melody: Generating and visualizing machine learning model summary C. Silva, ‘‘GALE: Globally assessing local explanations,’’ in Proc. ICML,
to understand data and classifiers together,’’ 2020, arXiv:2007.10614. Jul. 2022, pp. 322–331.
[119] H. Kumar and J. Chandran, ‘‘Is Shapley explanation for a model unique?’’ [145] G. Singh, F. Memoli, and G. E. Carlsson, ‘‘Topological methods for
2021, arXiv:2111.11946. the analysis of high dimensional data sets and 3D object recognition,’’
[120] J. Yuan, G. Yeuk-Yin Chan, B. Barr, K. Overton, K. Rees, L. Gustavo PBG@Eurographics, vol. 2, pp. 91–100, Sep. 2007.
Nonato, E. Bertini, and C. T. Silva, ‘‘SUBPLEX: Towards a better [146] Á. A. Cabrera, W. Epperson, F. Hohman, M. Kahng, J. Morgenstern, and
understanding of black box model explanations at the subpopulation level,’’ D. H. Chau, ‘‘FAIRVIS: Visual analytics for discovering intersectional
2020, arXiv:2007.10609. bias in machine learning,’’ in Proc. IEEE Conf. Vis. Analytics Sci. Technol.
[121] M. Wojtas and K. Chen, ‘‘Feature importance ranking for deep learning,’’ (VAST), Vancouver, BC, Canada, Oct. 2019, pp. 46–56.
in Proc. Adv. Neural Inf. Process. Syst., vol. 33, 2020, pp. 5105–5114.
[147] E. S. Ortigossa, F. F. Dias, and D. C. D. Nascimento, ‘‘Getting over high-
[122] I. Stepin, J. M. Alonso, A. Catala, and M. Pereira-Fariña, ‘‘A survey of con- dimensionality: How multidimensional projection methods can assist data
trastive and counterfactual explanation generation methods for explainable science,’’ Appl. Sci., vol. 12, no. 13, p. 6799, Jul. 2022.
artificial intelligence,’’ IEEE Access, vol. 9, pp. 11974–12001, 2021.
[148] G. D. Cantareira, E. Etemad, and F. V. Paulovich, ‘‘Exploring neural
[123] D. Alvarez-Melis and T. S. Jaakkola, ‘‘On the robustness of interpretability
network hidden layer activity using vector fields,’’ Information, vol. 11,
methods,’’ 2018, arXiv:1806.08049.
no. 9, p. 426, Aug. 2020.
[124] Q. V. Liao, D. Gruen, and S. Miller, ‘‘Questioning the AI: Informing design
[149] J. Yuan, G. Y. Chan, B. Barr, K. Overton, K. Rees, L. G. Nonato, E. Bertini,
practices for explainable AI user experiences,’’ in Proc. CHI Conf. Human
and C. T. Silva, ‘‘SUBPLEX: A visual analytics approach to understand
Factors Comput. Syst., Honolulu, HI, USA, Apr. 2020, pp. 1–15.
local model explanations at the subpopulation level,’’ IEEE Comput.
[125] T. Tam Nguyen, T. Trung Huynh, Z. Ren, T. Toan Nguyen, P. Le Nguyen,
Graph. Appl., vol. 42, no. 6, pp. 24–36, Nov. 2022.
H. Yin, and Q. Viet Hung Nguyen, ‘‘A survey of privacy-preserving
model explanations: Privacy risks, attacks, and countermeasures,’’ 2024, [150] L. McInnes, J. Healy, and J. Melville, ‘‘UMAP: Uniform manifold approxi-
arXiv:2404.00673. mation and projection for dimension reduction,’’ 2018, arXiv:1802.03426.
[126] A. Das and P. Rad, ‘‘Opportunities and challenges in explainable artificial [151] Y. Li, L. Ding, and X. Gao, ‘‘On the decision boundary of deep neural
intelligence (XAI): A survey,’’ 2020, arXiv:2006.11371. networks,’’ 2018, arXiv:1808.05385.
[127] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan, [152] S. Guan and M. Loew, ‘‘Analysis of generalizability of deep neural
‘‘A survey on bias and fairness in machine learning,’’ ACM Comput. Surv., networks based on the complexity of decision boundary,’’ in Proc. 19th
vol. 54, no. 6, pp. 1–35, Jul. 2022. IEEE Int. Conf. Mach. Learn. Appl. (ICMLA), Miami, FL, USA, Dec. 2020,
[128] S. Mishra, S. Dutta, J. Long, and D. Magazzeni, ‘‘A survey on the pp. 101–106.
robustness of feature importance and counterfactual explanations,’’ 2021, [153] A. Englhardt, H. Trittenbach, D. Kottke, B. Sick, and K. Böhm,
arXiv:2111.00358. ‘‘Efficient SVDD sampling with approximation guarantees for the decision
[129] S. Das, A. M. Javid, P. B. Gohain, Y. C. Eldar, and S. Chatterjee, ‘‘Neural boundary,’’ 2020, arXiv:2009.13853.
greedy pursuit for feature selection,’’ 2022, arXiv:2207.09390. [154] J. Sohns, C. Garth, and H. Leitte, ‘‘Decision boundary visualization for
[130] L. van der Maaten and G. Hinton, ‘‘Visualizing data using t-SNE,’’ counterfactual reasoning,’’ Comput. Graph. Forum, vol. 42, no. 1, pp. 7–20,
J. Mach. Learn. Res., vol. 9, pp. 2579–2605, Nov. 2008. Feb. 2023.
[131] L. G. Nonato and M. Aupetit, ‘‘Multidimensional projection for [155] S. Wachter, B. Mittelstadt, and C. Russell, ‘‘Counterfactual explanations
visual analytics: Linking techniques with distortions, tasks, and layout without opening the black box: Automated decisions and the GDPR,’’
enrichment,’’ IEEE Trans. Vis. Comput. Graphics, vol. 25, no. 8, Harv. JL Tech., vol. 31, p. 841, Jan. 2017.
pp. 2650–2673, Aug. 2019. [156] A. Jacovi, S. Swayamdipta, S. Ravfogel, Y. Elazar, Y. Choi, and
[132] G. Hinton, O. Vinyals, and J. Dean, ‘‘Distilling the knowledge in a neural Y. Goldberg, ‘‘Contrastive explanations for model interpretability,’’ 2021,
network,’’ 2015, arXiv:1503.02531. arXiv:2103.01378.
[133] R. Guidotti, A. Monreale, S. Ruggieri, D. Pedreschi, F. Turini, and [157] S. Verma, V. Boonsanong, M. Hoang, K. E. Hines, J. P. Dickerson, and
F. Giannotti, ‘‘Local rule-based explanations of black box decision C. Shah, ‘‘Counterfactual explanations and algorithmic recourses for
systems,’’ 2018, arXiv:1805.10820. machine learning: A review,’’ 2020, arXiv:2010.10596.
[134] C. Zednik, ‘‘Solving the black box problem: A normative framework for [158] R. Poyiadzi, K. Sokol, R. Santos-Rodriguez, T. De Bie, and P. Flach,
explainable artificial intelligence,’’ Philosophy Technol., vol. 34, no. 2, ‘‘FACE: Feasible and actionable counterfactual explanations,’’ in Proc.
pp. 265–288, Jun. 2021. AAAI/ACM Conf. AI, Ethics, Soc., Feb. 2020, pp. 344–350.
[135] Y. Lou, R. Caruana, and J. Gehrke, ‘‘Intelligible models for classification [159] M. M. Raimundo, L. G. Nonato, and J. Poco, ‘‘Mining Pareto-optimal
and regression,’’ in Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery counterfactual antecedents with a branch-and-bound model-agnostic
Data Mining, Beijing, China, Aug. 2012, pp. 150–158. algorithm,’’ Data Mining Knowl. Discovery, pp. 1–33, Dec. 2022.
[136] T. J. Hastie and R. J. Tibshirani, Generalized Additive Models. Oxfordshire, [160] R. M. J. Byrne, ‘‘Counterfactuals in explainable artificial intelligence
U.K.: Routledge, 2017. (XAI): Evidence from human reasoning,’’ in Proc. 29th Int. Joint Conf.
[137] D. Brown, C. Godfrey, N. Konz, J. Tu, and H. Kvinge, ‘‘Understanding the Artif. Intell., Aug. 2019, pp. 6276–6282.
inner workings of language models through representation dissimilarity,’’ [161] U. Aïvodji, A. Bolot, and S. Gambs, ‘‘Model extraction from counterfactual
2023, arXiv:2310.14993. explanations,’’ 2020, arXiv:2009.01884.
[162] P. Dissanayake and S. Dutta, ‘‘Model reconstruction using counter- [186] K. Simonyan, A. Vedaldi, and A. Zisserman, ‘‘Deep inside convolutional
factual explanations: Mitigating the decision boundary shift,’’ 2024, networks: Visualising image classification models and saliency maps,’’
arXiv:2405.05369. 2013, arXiv:1312.6034.
[163] S. Barocas, A. D. Selbst, and M. Raghavan, ‘‘The hidden assumptions [187] D. Erhan, Y. Bengio, A. Courville, and P. Vincent, ‘‘Visualizing higher-
behind counterfactual explanations and principal reasons,’’ in Proc. Conf. layer features of a deep network,’’ Univ. Montreal, Tech. Rep., 1341, 2009.
Fairness, Accountability, Transparency. New York, NY, USA: Association [188] M. D. Zeiler and R. Fergus, ‘‘Visualizing and understanding convolutional
for Computing Machinery, Jan. 2020, pp. 80–89. networks,’’ in Computer Vision—ECCV, Zurich, Switzerland. Springer,
[164] A. Van Looveren and J. Klaise, ‘‘Interpretable counterfactual explanations 2014, pp. 818–833.
guided by prototypes,’’ in Proc. Joint Eur. Conf. Mach. Learn. Knowl. [189] C. Agarwal, D. Ley, S. Krishna, E. Saxena, M. Pawelczyk, N. Johnson,
Discov. Databases. London, U.K.: Springer, 2021, pp. 650–665. I. Puri, M. Zitnik, and H. Lakkaraju, ‘‘OpenXAI: Towards a transparent
[165] D. Luo, W. Cheng, D. Xu, W. Yu, B. Zong, H. Chen, and X. Zhang, evaluation of model explanations,’’ 2022, arXiv:2206.11104.
‘‘Parameterized explainer for graph neural network,’’ in Proc. Adv. Neural [190] S. Krishna, T. Han, A. Gu, J. Pombra, S. Jabbari, S. Wu, and
Inf. Process. Syst., vol. 33, 2020, pp. 19620–19631. H. Lakkaraju, ‘‘The disagreement problem in explainable machine
[166] M. Jia, B. Gabrys, and K. Musial, ‘‘A network science perspective learning: A practitioner’s perspective,’’ 2022, arXiv:2202.01602.
of graph convolutional networks: A survey,’’ IEEE Access, vol. 11, [191] G. Montavon, W. Samek, and K.-R. Müller, ‘‘Methods for interpreting
pp. 39083–39122, 2023. and understanding deep neural networks,’’ Digit. Signal Process., vol. 73,
[167] Z. Ying, D. Bourgeois, J. You, M. Zitnik, and J. Leskovec, ‘‘GNNexplainer: pp. 1–15, Feb. 2018.
Generating explanations for graph neural networks,’’ in Proc. Adv. Neural [192] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and
Inf. Process. Syst., vol. 32, 2019, pp. 1–12. W. Samek, ‘‘On pixel-wise explanations for non-linear classifier decisions
[168] M. Vu and M. T. Thai, ‘‘PGM-Explainer: Probabilistic graphical model by layer-wise relevance propagation,’’ PLoS ONE, vol. 10, no. 7, Jul. 2015,
explanations for graph neural networks,’’ in Proc. Adv. Neural Inf. Process. Art. no. e0130140.
Syst., vol. 33, 2020, pp. 12225–12235. [193] E. S. Ortigossa, F. F. Dias, B. Barr, C. T. Silva, and L. Gustavo Nonato, ‘‘T-
[169] H. Yuan, H. Yu, J. Wang, K. Li, and S. Ji, ‘‘On explainability of graph explainer: A model-agnostic explainability framework based on gradients,’’
neural networks via subgraph explorations,’’ in Proc. Int. Conf. Mach. 2024, arXiv:2404.16495.
Learn., 2021, pp. 12241–12252. [194] M. Hamilton, S. Lundberg, L. Zhang, S. Fu, and W. T. Freeman,
[170] L. S. Shapley, ‘‘A value for n-person games,’’ in Contributions to the ‘‘Axiomatic explanations for visual search, retrieval, and similarity
Theory of Games (AM-28), Volume II. Princeton, NJ, USA: Princeton learning,’’ 2021, arXiv:2103.00370.
Univ. Press, 1953, p. 2. [195] M. Kohlbrenner, A. Bauer, S. Nakajima, A. Binder, W. Samek, and
S. Lapuschkin, ‘‘Towards best practice in explaining neural network
[171] H. Yuan, J. Tang, X. Hu, and S. Ji, ‘‘XGNN: Towards model-level
decisions with LRP,’’ in Proc. Int. Joint Conf. Neural Netw. (IJCNN),
explanations of graph neural networks,’’ in Proc. 26th ACM SIGKDD
Glasgow, U.K., Jul. 2020, pp. 1–7.
Int. Conf. Knowl. Discovery Data Mining, Aug. 2020, pp. 430–438.
[196] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and
[172] F. Baldassarre and H. Azizpour, ‘‘Explainability techniques for graph
D. Batra, ‘‘Grad-CAM: Visual explanations from deep networks via
convolutional networks,’’ 2019, arXiv:1905.13686.
gradient-based localization,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
[173] H. Yuan, H. Yu, S. Gui, and S. Ji, ‘‘Explainability in graph neural networks: Oct. 2017, pp. 618–626.
A taxonomic survey,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 45,
[197] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, ‘‘Learning
no. 5, pp. 5782–5799, May 2023.
deep features for discriminative localization,’’ in Proc. IEEE Conf. Comput.
[174] K. Amara, R. Ying, Z. Zhang, Z. Han, Y. Shan, U. Brandes, S. Schemm, and Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2921–2929.
C. Zhang, ‘‘GraphFramEx: Towards systematic evaluation of explainability
[198] A. Chattopadhay, A. Sarkar, P. Howlader, and V. N. Balasubramanian,
methods for graph neural networks,’’ 2022, arXiv:2206.09677.
‘‘Grad-CAM++: Generalized gradient-based visual explanations for deep
[175] L. Faber, A. K. Moghaddam, and R. Wattenhofer, ‘‘When comparing to convolutional networks,’’ in Proc. IEEE Winter Conf. Appl. Comput. Vis.
ground truth is wrong: On evaluating GNN explanation methods,’’ in Proc. (WACV), Mar. 2018, pp. 839–847.
27th ACM SIGKDD Conf. Knowl. Discovery Data Mining, Aug. 2021, [199] P.-T. Jiang, C.-B. Zhang, Q. Hou, M.-M. Cheng, and Y. Wei, ‘‘LayerCAM:
pp. 332–341. Exploring hierarchical class activation maps for localization,’’ IEEE Trans.
[176] R. Jozefowicz, O. Vinyals, M. Schuster, N. Shazeer, and Y. Wu, ‘‘Exploring Image Process., vol. 30, pp. 5875–5888, 2021.
the limits of language modeling,’’ 2016, arXiv:1602.02410. [200] A. Shrikumar, P. Greenside, A. Shcherbina, and A. Kundaje, ‘‘Not just
[177] Y. Kim, C. Denton, L. Hoang, and A. M. Rush, ‘‘Structured attention a black box: Learning important features through propagating activation
networks,’’ 2017, arXiv:1702.00887. differences,’’ 2016, arXiv:1605.01713.
[178] S. Biderman, H. Schoelkopf, Q. G. Anthony, H. Bradley, K. O’Brien, [201] D. Smilkov, N. Thorat, B. Kim, F. Viégas, and M. Wattenberg, ‘‘Smooth-
E. Hallahan, M. A. Khan, S. Purohit, U. S. Prashanth, and E. Raff, ‘‘Pythia: Grad: Removing noise by adding noise,’’ 2017, arXiv:1706.03825.
A suite for analyzing large language models across training and scaling,’’ [202] P. Sturmfels, S. Lundberg, and S.-I. Lee, ‘‘Visualizing the impact of feature
in Proc. Int. Conf. Mach. Learn., 2023, pp. 2397–2430. attribution baselines,’’ Distill, vol. 5, no. 1, p. e22, 2020.
[179] J. Vig, ‘‘A multiscale visualization of attention in the transformer [203] A. Shrikumar, P. Greenside, and A. Kundaje, ‘‘Learning important features
model,’’ in Proc. 57th Annu. Meeting Assoc. for Comput. Linguistics: through propagating activation differences,’’ in Proc. 34th Int. Conf.
Syst. Demonstrations. Florence, Italy: Association for Computational Mach. Learn., 2017, pp. 3145–3153.
Linguistics, Jul. 2019, pp. 37–42. [204] S. Hu, Y. Gao, Z. Niu, Y. Jiang, L. Li, X. Xiao, M. Wang, E. F. Fang,
[180] S. Biderman, U. Sai Prashanth, L. Sutawika, H. Schoelkopf, Q. Anthony, W. Menpes-Smith, J. Xia, H. Ye, and G. Yang, ‘‘Weakly supervised deep
S. Purohit, and E. Raff, ‘‘Emergent and predictable memorization in large learning for COVID-19 infection detection and classification from CT
language models,’’ 2023, arXiv:2304.11158. images,’’ IEEE Access, vol. 8, pp. 118869–118883, 2020.
[181] J. Vig, ‘‘BertViz: A tool for visualizing multihead self-attention in the [205] M. Sundararajan and A. Najmi, ‘‘The many Shapley values for model
BERT model,’’ in Proc. ICLR Workshop: Debugging Mach. Learn. Models, explanation,’’ in Proc. Int. Conf. Mach. Learn. (ICML), Jul. 2020,
vol. 23, 2019, pp. 1–6. pp. 9269–9278.
[182] A. Garde, E. Kran, and F. Barez, ‘‘DeepDecipher: Accessing and [206] G. Eraslan, Ž. Avsec, J. Gagneur, and F. J. Theis, ‘‘Deep learning: New
investigating neuron activation in large language models,’’ 2023, computational modelling techniques for genomics,’’ Nature Rev. Genet.,
arXiv:2310.01870. vol. 20, no. 7, pp. 389–403, Jul. 2019.
[183] A. Foote, N. Nanda, E. Kran, I. Konstas, S. Cohen, and F. Barez, [207] J. J. Thiagarajan, B. Kailkhura, P. Sattigeri, and K. N. Ramamurthy,
‘‘Neuron to graph: Interpreting language model neurons at scale,’’ 2023, ‘‘TreeView: Peeking into deep neural networks via feature-space
arXiv:2305.19911. partitioning,’’ 2016, arXiv:1611.07429.
[184] H. Chefer, S. Gur, and L. Wolf, ‘‘Transformer interpretability beyond [208] M. T. Ribeiro, S. Singh, and C. Guestrin, ‘‘Model-agnostic interpretability
attention visualization,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern of machine learning,’’ 2016, arXiv:1606.05386.
Recognit. (CVPR), Jun. 2021, pp. 782–791. [209] M. T. Ribeiro, S. Singh, and C. Guestrin, ‘‘Nothing else matters:
[185] M. Sundararajan, A. Taly, and Q. Yan, ‘‘Axiomatic attribution for deep Model-agnostic explanations by identifying prediction invariance,’’ 2016,
networks,’’ in Proc. Int. Conf. Mach. Learn., vol. 70, 2017, pp. 3319–3328. arXiv:1611.05817.
[210] M. T. Ribeiro, S. Singh, and C. Guestrin, ‘‘Anchors: High-precision model- [235] M. Ye and Y. Sun, ‘‘Variable selection via penalized neural network:
agnostic explanations,’’ in Proc. AAAI Conf. Artif. Intell., 2018, vol. 32, A drop-out-one loss approach,’’ in Proc. Int. Conf. Mach. Learn., 2018,
no. 1, pp. 1–9. pp. 5620–5629.
[211] B. Lim, S. Ö. Arik, N. Loeff, and T. Pfister, ‘‘Temporal fusion transformers [236] H. J. P. Weerts, W. van Ipenburg, and M. Pechenizkiy, ‘‘A human-grounded
for interpretable multi-horizon time series forecasting,’’ Int. J. Forecasting, evaluation of SHAP for alert processing,’’ 2019, arXiv:1907.03324.
vol. 37, no. 4, pp. 1748–1764, Oct. 2021. [237] S. Tsirtsis and M. Gomez Rodriguez, ‘‘Decisions, counterfactual
[212] R. Yousefzadeh and D. P. O’Leary, ‘‘Investigating decision boundaries of explanations and strategic behavior,’’ in Proc. Adv. Neural Inf. Process.
trained neural networks,’’ 2019, arXiv:1908.02802. Syst., vol. 33, 2020, pp. 16749–16760.
[213] M. R. Zafar and N. Khan, ‘‘Deterministic local interpretable model- [238] A. Jeyasothy, T. Laugel, M.-J. Lesot, C. Marsala, and M. Detyniecki,
agnostic explanations for stable explainability,’’ Mach. Learn. Knowl. ‘‘Integrating prior knowledge in post-hoc explanations,’’ 2022,
Extraction, vol. 3, no. 3, pp. 525–541, Jun. 2021. arXiv:2204.11634.
[214] R. Turner, ‘‘A model explanation system,’’ in Proc. IEEE 26th Int. [239] D. Alvarez-Melis and T. S. Jaakkola, ‘‘Towards robust interpretability with
Workshop Mach. Learn. Signal Process. (MLSP), Sep. 2016, pp. 1–6. self-explaining neural networks,’’ 2018, arXiv:1806.07538.
[215] H. Lakkaraju, N. Arsov, and O. Bastani, ‘‘Robust and stable black box [240] J. DeYoung, S. Jain, N. Fatema Rajani, E. Lehman, C. Xiong, R. Socher,
explanations,’’ in Proc. Int. Conf. Mach. Learn., 2020, pp. 5628–5638. and B. C. Wallace, ‘‘ERASER: A benchmark to evaluate rationalized NLP
[216] X. Situ, I. Zukerman, C. Paris, S. Maruf, and G. Haffari, ‘‘Learning models,’’ 2019, arXiv:1911.03429.
to explain: Generating stable explanations fast,’’ in Proc. 59th Annu. [241] B. Mathew, P. Saha, S. M. Yimam, C. Biemann, P. Goyal, and
Meeting Assoc. Comput. Linguistics 11th Int. Joint Conf. Natural Lang. A. Mukherjee, ‘‘Hatexplain: A benchmark dataset for explainable hate
Process., 2021, pp. 5340–5355. speech detection,’’ in Proc. AAAI Conf. Artif. Intell., 2021, vol. 35, no. 17,
[217] E. Friedman and H. Moulin, ‘‘Three methods to share joint costs or pp. 14867–14875.
surplus,’’ J. Econ. Theory, vol. 87, no. 2, pp. 275–312, Aug. 1999. [242] V. Petsiuk, A. Das, and K. Saenko, ‘‘RISE: Randomized input sampling
for explanation of black-box models,’’ 2018, arXiv:1806.07421.
[218] I. E. Kumar, S. Venkatasubramanian, C. Scheidegger, and S. Friedler,
‘‘Problems with Shapley-value-based explanations as feature importance [243] C. Agarwal, N. Johnson, M. Pawelczyk, S. Krishna, E. Saxena,
measures,’’ in Proc. 37th Int. Conf. Mach. Learn., 2020, pp. 5491–5500. M. Zitnik, and H. Lakkaraju, ‘‘Rethinking stability for attribution-based
explanations,’’ 2022, arXiv:2203.06877.
[219] S. Lipovetsky and M. Conklin, ‘‘Analysis of regression in game theory
approach,’’ Appl. Stochastic Models Bus. Ind., vol. 17, no. 4, pp. 319–330, [244] A. Hedström, L. Weber, D. Krakowczyk, D. Bareeva, F. Motzkus,
Oct. 2001. W. Samek, S. Lapuschkin, and M. M.-C. Höhne, ‘‘Quantus: An explainable
AI toolkit for responsible evaluation of neural network explanations and
[220] E. Strumbelj and I. Kononenko, ‘‘An efficient explanation of individual
beyond,’’ J. Mach. Learn. Res., vol. 24, no. 34, pp. 1–11, 2023.
classifications using game theory,’’ J. Mach. Learn. Res., vol. 11, pp. 1–18,
Mar. 2010. [245] T. Han, S. Srinivas, and H. Lakkaraju, ‘‘Which explanation should I
choose? A function approximation perspective to characterizing post hoc
[221] E. Strumbelj and I. Kononenko, ‘‘Explaining prediction models and
explanations,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 35, 2022,
individual predictions with feature contributions,’’ Knowl. Inf. Syst., vol. 41,
pp. 5256–5268.
no. 3, pp. 647–665, Dec. 2014.
[246] R. F. Barber and E. J. Candès, ‘‘Controlling the false discovery rate via
[222] S. M. Lundberg, G. G. Erion, and S.-I. Lee, ‘‘Consistent individualized
knockoffs,’’ Ann. Statist., vol. 43, no. 5, pp. 2055–2085, Oct. 2015.
feature attribution for tree ensembles,’’ 2018, arXiv:1802.03888.
[247] K. Banachewicz, L. Massaron, and A. Goldbloom, The Kaggle Book:
[223] D. Slack, S. Hilgard, E. Jia, S. Singh, and H. Lakkaraju, ‘‘Fooling lime
Data Analysis and Machine Learning for Competitive Data Science.
and shap: Adversarial attacks on post hoc explanation methods,’’ in Proc.
Birmingham, U.K.: Packt Publishing, 2022.
AAAI/ACM Conf. AI, Ethics, Soc. New York, NY, USA: Association for
[248] Z. Huang, ‘‘Extensions to the k-means algorithm for clustering large data
Computing Machinery, 2020, pp. 180–186.
sets with categorical values,’’ Data Mining Knowl. Discovery, vol. 2, no. 3,
[224] S. Hooker, D. Erhan, P.-J. Kindermans, and B. Kim, ‘‘A bench- pp. 283–304, 1998.
mark for interpretability methods in deep neural networks,’’ 2018,
[249] A. R. de Leon and K. C. Carrière, ‘‘A generalized Mahalanobis distance for
arXiv:1806.10758.
mixed data,’’ J. Multivariate Anal., vol. 92, no. 1, pp. 174–185, Jan. 2005.
[225] G. Hooker, L. Mentch, and S. Zhou, ‘‘Unrestricted permutation forces
[250] A. Bunt, M. Lount, and C. Lauzon, ‘‘Are explanations always important:
extrapolation: Variable importance requires at least one more model, or
A study of deployed, low-cost intelligent interactive systems,’’ in Proc.
there is no free variable importance,’’ 2019, arXiv:1905.03151.
ACM Int. Conf. Intell. User Interfaces. New York, NY, USA: Association
[226] G. Hooker, L. Mentch, and S. Zhou, ‘‘Unrestricted permutation forces for Computing Machinery, Feb. 2012, pp. 169–178.
extrapolation: Variable importance requires at least one more model, or
[251] M. Ghassemi, L. Oakden-Rayner, and A. L. Beam, ‘‘The false hope of
there is no free variable importance,’’ Statist. Comput., vol. 31, no. 6,
current approaches to explainable artificial intelligence in health care,’’
pp. 1–16, Nov. 2021.
Lancet Digit. Health, vol. 3, no. 11, pp. 745–750, Nov. 2021.
[227] D. Janzing, L. Minorics, and P. Blöbaum, ‘‘Feature relevance quantification [252] C. Meske and E. Bunde, ‘‘Transparency and trust in human-AI-interaction:
in explainable AI: A causal problem,’’ 2019, arXiv:1910.13413. The role of model-agnostic explanations in computer vision-based
[228] H. Kaur, H. Nori, S. Jenkins, R. Caruana, H. Wallach, and J. Wortman decision support,’’ in Artificial Intelligence in HCI, Copenhagen, Denmark.
Vaughan, ‘‘Interpreting interpretability: Understanding data Scientists’ use Springer, 2020, pp. 54–69.
of interpretability tools for machine learning,’’ in Proc. CHI Conf. Human [253] A. Zhang, L. Xing, J. Zou, and J. C. Wu, ‘‘Shifting machine learning for
Factors Comput. Syst., Apr. 2020, pp. 1–14. healthcare from development to deployment and from models to data,’’
[229] Y. Liu, S. Khandagale, C. White, and W. Neiswanger, ‘‘Synthetic Nature Biomed. Eng., vol. 6, no. 12, pp. 1330–1345, Jul. 2022.
benchmarks for scientific research in explainable machine learning,’’ 2021, [254] V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung,
arXiv:2106.12543. and B. J. Lance, ‘‘EEGNet: A compact convolutional neural network for
[230] J. Zhou, A. H. Gandomi, F. Chen, and A. Holzinger, ‘‘Evaluating the quality EEG-based brain–computer interfaces,’’ J. Neural Eng., vol. 15, no. 5,
of machine learning explanations: A survey on methods and metrics,’’ Oct. 2018, Art. no. 056013.
Electronics, vol. 10, no. 5, p. 593, Mar. 2021. [255] S. Qiu, M. I. Miller, P. S. Joshi, J. C. Lee, C. Xue, Y. Ni, Y. Wang,
[231] F. Bodria, F. Giannotti, R. Guidotti, F. Naretto, D. Pedreschi, and I. De Anda-Duran, P. H. Hwang, and J. A. Cramer, ‘‘Multimodal deep
S. Rinzivillo, ‘‘Benchmarking and survey of explanation methods for black learning for Alzheimer’s disease dementia assessment,’’ Nature Commun.,
box models,’’ 2021, arXiv:2102.13076. vol. 13, no. 1, p. 3404, 2022.
[232] I. Hameed, S. Sharpe, D. Barcklow, J. Au-Yeung, S. Verma, J. Huang, [256] Y. Zoabi, S. Deri-Rozov, and N. Shomron, ‘‘Machine learning-based
B. Barr, and C. B. Bruss, ‘‘BASED-XAI: Breaking ablation studies down prediction of COVID-19 diagnosis based on symptoms,’’ Npj Digit. Med.,
for explainable artificial intelligence,’’ 2022, arXiv:2207.05566. vol. 4, no. 1, pp. 1–5, Jan. 2021.
[233] B. Barr, N. Fatsi, L. Hancox-Li, P. Richter, D. Proano, and C. Mok, [257] E. R. Pfaff, A. T. Girvin, T. D. Bennett, A. Bhatia, I. M. Brooks,
‘‘The disagreement problem in faithfulness metrics,’’ in Proc. NIPS, New R. R. Deer, J. P. Dekermanjian, S. E. Jolley, M. G. Kahn, K. Kostka, and
Orleans, LA, USA, 2023, pp. 1–13. J. A. McMurry, ‘‘Identifying who has long COVID in the USA: A machine
[234] J. Haug, S. Zürn, P. El-Jiz, and G. Kasneci, ‘‘On baselines for local feature learning approach using N3C data,’’ Lancet Digit. Health, vol. 4, no. 7,
attributions,’’ 2021, arXiv:2101.00905. pp. 532–541, Jul. 2022.
[258] L. Brunese, F. Mercaldo, A. Reginelli, and A. Santone, ‘‘Explainable [276] K. McCloskey, A. Taly, F. Monti, M. P. Brenner, and L. J. Colwell, ‘‘Using
deep learning for pulmonary disease and coronavirus COVID-19 detection attribution to decode binding mechanism in neural network models for
from X-rays,’’ Comput. Methods Programs Biomed., vol. 196, Nov. 2020, chemistry,’’ Proc. Nat. Acad. Sci. USA, vol. 116, no. 24, pp. 11624–11629,
Art. no. 105608. Jun. 2019.
[259] T. Ozturk, M. Talo, E. A. Yildirim, U. B. Baloglu, O. Yildirim, and [277] P. Schwaller, B. Hoover, J.-L. Reymond, H. Strobelt, and T. Laino,
U. Rajendra Acharya, ‘‘Automated detection of COVID-19 cases using ‘‘Extraction of organic chemistry grammar from unsupervised learning of
deep neural networks with X-ray images,’’ Comput. Biol. Med., vol. 121, chemical reactions,’’ Sci. Adv., vol. 7, no. 15, Apr. 2021, Art. no. eabe4166.
Jun. 2020, Art. no. 103792. [278] J. Yang, L. Tao, J. He, J. R. McCutcheon, and Y. Li, ‘‘Machine learning
[260] Y. Oh, S. Park, and J. C. Ye, ‘‘Deep learning COVID-19 features on CXR enables interpretable discovery of innovative polymers for gas separation
using limited training data sets,’’ IEEE Trans. Med. Imag., vol. 39, no. 8, membranes,’’ Sci. Adv., vol. 8, no. 29, Jul. 2022, Art. no. eabn9545.
pp. 2688–2700, Aug. 2020. [279] D. Jiang, Z. Wu, C.-Y. Hsieh, G. Chen, B. Liao, Z. Wang, C. Shen, D. Cao,
[261] H. A. Elmarakeby, J. Hwang, R. Arafeh, J. Crowdis, S. Gang, D. Liu, J. Wu, and T. Hou, ‘‘Could graph neural networks learn better molecular
S. H. AlDubayan, K. Salari, S. Kregel, C. Richter, T. E. Arnoff, J. Park, W. representation for drug discovery? A comparison study of descriptor-based
C. Hahn, and E. M. Van Allen, ‘‘Biologically informed deep neural network and graph-based models,’’ J. Cheminformatics, vol. 13, no. 1, pp. 1–23,
for prostate cancer discovery,’’ Nature, vol. 598, no. 7880, pp. 348–352, Feb. 2021.
Oct. 2021. [280] J. Jiménez-Luna, F. Grisoni, and G. Schneider, ‘‘Drug discovery with
[262] R. J. Chen, M. Y. Lu, J. Wang, D. F. K. Williamson, S. J. Rodig, explainable artificial intelligence,’’ Nature Mach. Intell., vol. 2, no. 10,
N. I. Lindeman, and F. Mahmood, ‘‘Pathomic fusion: An integrated pp. 573–584, Oct. 2020.
framework for fusing histopathology and genomic features for cancer [281] A. Davies, P. Velickovic, L. Buesing, S. Blackwell, D. Zheng, N. Tomasev,
diagnosis and prognosis,’’ IEEE Trans. Med. Imag., vol. 41, no. 4, R. Tanburn, P. Battaglia, C. Blundell, A. Juhász, M. Lackenby,
pp. 757–770, Apr. 2022. G. Williamson, D. Hassabis, and P. Kohli, ‘‘Advancing mathematics by
[263] D. Wang, C. Zhang, B. Wang, B. Li, Q. Wang, D. Liu, H. Wang, Y. Zhou, guiding human intuition with AI,’’ Nature, vol. 600, no. 7887, pp. 70–74,
L. Shi, F. Lan, and Y. Wang, ‘‘Optimized CRISPR guide RNA design Dec. 2021.
for two high-fidelity Cas9 variants by deep learning,’’ Nature Commun., [282] C. Molnar, G. Casalicchio, and B. Bischl, ‘‘iml: An R package for
vol. 10, no. 1, p. 4284, Sep. 2019. interpretable machine learning,’’ J. Open Source Softw., vol. 3, no. 26,
[264] N. Bar, T. Korem, O. Weissbrod, D. Zeevi, D. Rothschild, S. Leviatan, p. 786, Jun. 2018.
N. Kosower, M. Lotan-Pompan, A. Weinberger, and C. I. Le Roy, [283] H. Nori, S. Jenkins, P. Koch, and R. Caruana, ‘‘InterpretML:
‘‘A reference map of potential determinants for the human serum A unified framework for machine learning interpretability,’’ 2019,
metabolome,’’ Nature, vol. 588, no. 7836, pp. 135–140, 2020. arXiv:1909.09223.
[265] Ž. Avsec, V. Agarwal, D. Visentin, J. R. Ledsam, A. Grabska-Barwinska, [284] N. Kokhlikyan, V. Miglani, M. Martin, E. Wang, B. Alsallakh, J. Reynolds,
K. R. Taylor, Y. Assael, J. Jumper, P. Kohli, and D. R. Kelley, A. Melnikov, N. Kliushkina, C. Araya, S. Yan, and O. Reblitz-Richardson,
‘‘Effective gene expression prediction from sequence by integrating long- ‘‘Captum: A unified and generic model interpretability library for
range interactions,’’ Nature Methods, vol. 18, no. 10, pp. 1196–1203, PyTorch,’’ 2020, arXiv:2009.07896.
Oct. 2021. [285] M. Alber, S. Lapuschkin, P. Seegerer, M. Hägele, K. T. Schütt,
[266] T. Buergel, J. Steinfeldt, G. Ruyoga, M. Pietzner, D. Bizzarri, D. Vojinovic, G. Montavon, W. Samek, K.-R. Müller, S. Dähne, and P.-J. Kindermans,
J. U. Z. Belzen, L. Loock, P. Kittner, and L. Christmann, ‘‘Metabolomic ‘‘iNNvestigate neural networks!’’ J. Mach. Learn. Res., vol. 20, no. 93,
profiles predict individual multidisease outcomes,’’ Nature Med., vol. 28, pp. 1–8, 2019.
no. 11, pp. 2309–2320, Nov. 2022.
[286] V. Arya, R. K. E. Bellamy, P.-Y. Chen, A. Dhurandhar, M. Hind,
[267] A. Chklovski, D. H. Parks, B. J. Woodcroft, and G. W. Tyson, ‘‘CheckM2: S. C. Hoffman, S. Houde, Q. Vera Liao, R. Luss, A. Mojsilovic, S. Mourad,
A rapid, scalable and accurate tool for assessing microbial genome quality P. Pedemonte, R. Raghavendra, J. Richards, P. Sattigeri, K. Shanmugam,
using machine learning,’’ Nature Methods, vol. 20, no. 8, pp. 1203–1212, M. Singh, K. R. Varshney, D. Wei, and Y. Zhang, ‘‘One explanation does
Aug. 2023. not fit all: A toolkit and taxonomy of AI explainability techniques,’’ 2019,
[268] S. Pratt, I. Covert, R. Liu, and A. Farhadi, ‘‘What does a platypus look arXiv:1909.03012.
like? Generating customized prompts for zero-shot image classification,’’ [287] V. Arya, R. K. E. Bellamy, P.-Y. Chen, A. Dhurandhar, M. Hind,
in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, S. C. Hoffman, S. Houde, Q. V. Liao, R. Luss, A. Mojsilovic, S. Mourad,
pp. 15691–15701. P. Pedemonte, R. Raghavendra, J. Richards, P. Sattigeri, K. Shanmugam,
[269] J. Sarzynska-Wawer, A. Wawer, A. Pawlak, J. Szymanowska, I. Stefaniak, M. Singh, K. R. Varshney, D. Wei, and Y. Zhang, ‘‘AI explainability 360
M. Jarkiewicz, and L. Okruszek, ‘‘Detecting formal thought disorder by toolkit,’’ in Proc. 3rd ACM India Joint Int. Conf. Data Sci. Manag. Data,
deep contextualized word representations,’’ Psychiatry Res., vol. 304, Jan. 2021, pp. 376–379.
Oct. 2021, Art. no. 114135.
[288] J. Klaise, A. Van Looveren, G. Vacanti, and A. Coca, ‘‘Alibi explain:
[270] C. W. Hong, C. Lee, K. Lee, M.-S. Ko, and K. Hur, ‘‘Explainable
Algorithms for explaining machine learning models,’’ J. Mach. Learn.
artificial intelligence for the remaining useful life prognosis of the turbofan
Res., vol. 22, no. 181, pp. 1–7, 2021.
engines,’’ in Proc. 3rd IEEE Int. Conf. Knowl. Innov. Invention (ICKII),
[289] W. Yang, H. Le, T. Laud, S. Savarese, and S. C. H. Hoi, ‘‘OmniXAI: A
Kaohsiung, Taiwan, Aug. 2020, pp. 144–147.
library for explainable AI,’’ 2022, arXiv:2206.01612.
[271] L. C. Brito, G. A. Susto, J. N. Brito, and M. A. V. Duarte, ‘‘An explainable
[290] A. Theodorou, R. H. Wortham, and J. J. Bryson, ‘‘Designing and
artificial intelligence approach for unsupervised fault detection and
implementing transparency for real time inspection of autonomous robots,’’
diagnosis in rotating machinery,’’ Mech. Syst. Signal Process., vol. 163,
Connection Sci., vol. 29, no. 3, pp. 230–241, Jul. 2017.
Jan. 2022, Art. no. 108105.
[272] K. Xu, J. Yuan, Y. Wang, C. Silva, and E. Bertini, ‘‘MTSeer: Interactive [291] A. Kucharski, ‘‘Study epidemiology of fake news,’’ Nature, vol. 540,
visual exploration of models on multivariate time-series forecast,’’ in Proc. no. 7634, p. 525, Dec. 2016.
CHI Conf. Human Factors Comput. Syst. New York, NY, USA: Association [292] A. Preece, D. Harborne, D. Braines, R. Tomsett, and S. Chakraborty,
for Computing Machinery, 2021, pp. 1–15. ‘‘Stakeholders in explainable AI,’’ 2018, arXiv:1810.00184.
[273] A. B. Parsa, A. Movahedi, H. Taghipour, S. Derrible, and [293] C. Rudin, ‘‘Stop explaining black box machine learning models for high
A. Mohammadian, ‘‘Toward safer highways, application of XGBoost and stakes decisions and use interpretable models instead,’’ Nature Mach.
SHAP for real-time accident detection and feature analysis,’’ Accident Intell., vol. 1, no. 5, pp. 206–215, May 2019.
Anal. Prevention, vol. 136, Mar. 2020, Art. no. 105405. [294] L. Deck, A. Schomäcker, T. Speith, J. Schöffer, L. Kästner, and N. Kuhl,
[274] B. Sanchez-Lengeling, J. N. Wei, B. K. Lee, R. C. Gerkin, ‘‘Mapping the potential of explainable artificial intelligence (XAI) for
A. Aspuru-Guzik, and A. B. Wiltschko, ‘‘Machine learning for scent: fairness along the AI lifecycle,’’ 2024, arXiv:2404.18736.
Learning generalizable perceptual representations of small molecules,’’ [295] H. Shefrin, Beyond Greed and Fear: Understanding Behavioral Finance
2019, arXiv:1910.10685. and the Psychology of Investing. Oxford, U.K.: Oxford Univ. Press,
[275] K. Preuer, G. Klambauer, F. Rippmann, S. Hochreiter, and T. Unterthiner, 2002.
‘‘Interpretable deep learning in drug discovery,’’ in Explainable AI: [296] N. N. Taleb, Fooled by Randomness: The Hidden Role of Chance in Life
Interpreting, Explaining and Visualizing Deep Learning. London, U.K.: and in the Markets, vol. 1. New York City, NY, USA: Random House
Springer, 2019, pp. 331–345. Incorporated, 2005.
EVANDRO S. ORTIGOSSA received the B.Sc. LUIS GUSTAVO NONATO (Member, IEEE)
degree in computer science and the M.Sc. and Ph.D. received the Ph.D. degree in applied mathematics
degrees in computer science and computational from Pontifícia Universidade Católica do Rio de
mathematics from the Institute of Mathematics Janeiro (PUC-Rio), Rio de Janeiro, Brazil, in 1998.
and Computer Science, University of São Paulo From 2008 to 2010, he was a Visiting Scholar
(ICMC-USP), São Carlos, Brazil, in 2015, 2018, with the SCI Institute, The University of Utah.
and 2024, respectively, focusing his work on From 2016 to 2018, he was a Visiting Professor
multidimensional time-series analysis and machine with the Center for Data Science, New York
learning. University. He is currently a Professor with the
He is currently a member of the Graphics, Institute of Mathematics and Computer Science,
Imaging, Visualization, and Analytics Group (GIVA), ICMC-USP, where University of São Paulo (ICMC-USP), São Carlos, Brazil. His main research
he develops research on machine learning. His research interests include interests include visual analytics, geometric computing, data science, and
data science, machine learning, explainable artificial intelligence (XAI), visualization.
information visualization (InfoVis), and image processing. Dr. Nonato served on several program committees, including IEEE SciVis,
IEEE InfoVis, and EuroVis. He was an Associate Editor of the Computer
Graphics Forum journal and IEEE TRANSACTIONS ON VISUALIZATION AND
COMPUTER GRAPHICS and the Editor-in-Chief of the International Journal of
Applied Mathematics and Computational Sciences (SBMAC SpringerBriefs).
THALES GONÇALVES received the B.Sc. degree
in electrical engineering from the Federal Institute
of Espírito Santo (IFES), in 2014, the B.Sc. degree
in mathematics from the Center of Exact Sciences,
Federal University of Espírito Santo (CCE-UFES),
in 2017, the M.Sc. degree in signal processing and
pattern recognition from the Technology Center,
CT-UFES, in 2017, and the Ph.D. degree in
computer science and computational mathematics
from the Institute of Mathematics and Computer
Science, University of São Paulo (ICMC-USP), in 2024.
He is currently a member of the Graphics, Imaging, Visualization, and
Analytics Group (GIVA), ICMC-USP. During the Ph.D. study, he was a
Visiting Scholar with the Visualization, Imaging, and Data Analysis Center
(VIDA), New York University (NYU) Tandon School of Engineering. His
research interests include machine learning, graph neural networks, and
explainable/responsible AI.