Content area
Healthcare providers, policymakers, and defense contractors need to understand many types of machine learning model behaviors. While eXplainable Artificial Intelligence (XAI) provides tools for interpreting these behaviors, few frameworks, surveys, and taxonomies produce succinct yet general notation to help researchers and practitioners describe their explainability needs and quantify whether these needs are met. Such quantified comparisons could help individuals rank XAI methods by their relevance to use-cases, select explanations best suited for individual users, and evaluate what explanations are most useful for describing model behaviors. This paper collects, decomposes, and abstracts subcomponents of common XAI methods to identify a mathematically grounded syntax that applies generally to describing modern and future explanation types while remaining useful for discovering novel XAI methods. The resulting syntax, introduced as the Qi-Framework, generally defines explanation types in terms of the information being explained, their utility to inspectors, and the methods and information used to produce explanations. Just as programming languages define syntax to structure, simplify, and standardize software development, so too the Qi-Framework acts as a common language to help researchers and practitioners select, compare, and discover XAI methods. Derivative works may extend and implement the Qi-Framework to develop a more rigorous science for interpretable machine learning and inspire collaborative competition across XAI research.
Introduction
The field of eXplainable Artificial Intelligence (XAI) contains diverse methods that are relevant to healthcare, defense, finance, and automation, such as models that are inherently comprehensible for medical applications (Loh et al. 2022) and that increase trust in models deployed in high-stakes decision making (Carmichael 2024). These methods produce a plethora of explanation types representing many model types, data types, model behaviors. While some explanations reveal important model inputs (Ribeiro et al. 2016), others describe patterns learned in a model’s latent space (Simonyan et al. 2019) or produce surrogate models to aid comprehension (Chen et al. 2021). The number and diversity of these methods can be overwhelming (Bommer et al. 2024), which presents a challenge in simply describing, summarizing, and quantitatively comparing disparate method types in a way that helps researchers benchmark and select XAI methods.
While decades of research in XAI (Adadi and Berrada 2018) have produced numerous surveys (Adadi and Berrada 2018; Yuan et al. 2022), taxonomies (Speith 2022; Chromik and Schuessler 2020), and mathematical foundations for XAI (Ribeiro et al. 2016; Plumb et al. 2018), these works are limited. XAI surveys summarize existing research (Hanif et al. 2021) but infrequently contribute to quantitative methods for comparing diverse method types. Taxonomies often categorize models using rigid terminology (Das et al. 2020) rather than providing flexible frameworks that aid XAI method discovery. Sometimes individual articles provide mathematical notation that applies in specific conditions (Ribeiro et al. 2016), but fails to generalize across many explanation types. Cumulatively, these limitations illustrate a deeper issue embedded in XAI research which is expressed by Burkart and Huber (2021): “what is missing is a standard procedure to measure, quantify, and compare the explainability of... [XAI]... approaches that allows scientists to compare these different approaches."Minh et al. (2022), Hanif et al. (2021), and Belle and Papantonis (2021) make statements corroborating this sentiment. This paper summarizes XAI methods by introducing a simple mathematical expression (Eq. 1) that seeks to generally express many explanation types rather than a select few.
Following the advice of Adadi and Berrada (2018) who posit there is enough research to abstract a generic framework for XAI, we seek to consolidate existing methods into a simple syntax to aid in describing and developing explanation types and methods. Using this framework, researchers and practitioners can identify ways to apply existing quantitative methods more generally across explanation types. Such work precludes standard benchmarks and comparisons in XAI, which is an outstanding development needed in the field (Burkart and Huber 2021). Figure 1 summarizes this paper, which identifies common explanation types and methods, decomposes their mathematical subcomponents (e.g. inputs, function subcomponents, outputs, etc.), and abstracts a syntax for generally expressing each of them. This syntax helps researchers and practitioners Question what Information needs to be explained in certain use cases, and comprises the Question-Information (Qi) Framework for XAI. The Qi-Framework improves upon prior works by grounding classification and comparison schemes mathematically (see Sect. 2), which encourages more quantitative evaluation of diverse explanation types. Similar to how accuracy, precision, and recall inspired the development of Convolutional Neural Network (CNN) architectures and related benchmarks in computer vision (Khan et al. 2020; Russakovsky et al. 2015), perhaps laying the groundwork for general yet quantitative comparisons for XAI can propel collaborative competition toward more inherently interpretable models.
[See PDF for image]
Fig. 1
Graphical concept underpinning the research presented in this paper, how explanation methods were identified and decomposed to identify common method subcomponents before abstracting a general notation for describing diverse explanation types, derive forthcoming notation, and consider future prospects
This paper is organized per the illustration in Fig. 2 to justify, derive, and demonstrate the value of the Qi-Framework. Section 2 justifies the novelty of the Qi-Framework as being the first to abstract mathematical syntax for describing and generating explanatory approaches while generally applying to both historical and future research. Section 3 describes the methodology used to identify: (1) What information XAI methods commonly explain; (2) The information and function types which are typically used to produce explanations; (3) The common XAI methods and “method primitives"that are used to produce explanations; and (4) That two explanations of the same information may vary in utility to model inspectors. The Qi-Framework describes these properties using a general syntax abstracted from XAI method subcomponents, which is used to compare XAI methods tabularly in the results (Sect. 4). Section 5 discusses how the Qi-Framework helps researchers generate new XAI methods, and more generally apply existing quantitative comparisons in XAI to general explanations types. Numerous research prospects emerge from this work, as described in Sect. 6 before highlighting the paper’s conclusions in Sect. 7. Tables 9, 10, 11, 12, 13, 14, 15 in Appendix A documents 184 explanations reviewed when forming the Qi-Framework.
[See PDF for image]
Fig. 2
Graphical representation of this paper’s sections and structure, where extended discussions of the current work are provided in Appendix A
Background
The growing XAI literature is summarized in many surveys, taxonomies, and frameworks. Here, we use the word “framework"to represent a set of concepts, terms, and notation that describes either individual explanation types (e.g. like when Ribeiro et al. (2016) define explanations as an interpretable model) or categories of explanations (e.g. as in literature reviews of taxonomies). In essence, XAI frameworks provide languages for communicating the properties of explanation methods and types. In theory, frameworks with richer vocabulary aid robust communication. The following sections describe the shortcomings of prior frameworks (Sect. 2.1), and the novelty of the proposed Qi-Framework (Sect. 2.2).
Differentiation from prior works
Contrasting the Qi-Framework against existing frameworks reveals key differences. Existing frameworks may: (1) Compare XAI methods categorically, operationally, or compositionally; (2) Be useful for describing existing XAI methods or generating new ones; (3) Apply to individual explanation types or generally to many; (4) Describe only historical XAI methods or remain relevant to future developments; and (5) Be mathematically grounded, where abstract syntax aids communication and subsequent derivation. Table 1 defines these terms1 and compares the Qi-Framework against XAI surveys, taxonomies, and several individual methods to highlight opportunities left open by prior works:
Categories, composition, and operation: Existing frameworks summarize explanation types and methods categorically, compositionally, and operationally as complementary presentations of the XAI landscape. For example, explanatory approaches often use flowcharts to help practitioners understand and select individual XAI methods. These approaches describe methods as post-hoc or inherently interpretable, model-agnostic or model-specific, and local or global (Minh et al. 2022). Taxonomies by Dwivedi et al. (2023), Arya et al. (2019), and Martins et al. (2023) categorize methods via their implementation language, model use cases, and related human factors. While valuable for comparing XAI method properties, these categorical approaches fail to illustrate the underlying data requirements and internal behaviors of XAI methods which are important in developing new approaches. Compositional (Speith 2022) and operational (Schwalbe and Finzel 2023) comparisons meet this need by presenting model subcomponents (e.g. input, function subcomponents, and output types) and describing how subcomponents operate in sequence to produce explanations. While aiding low-level comprehension, these taxa frequently present complex diagrams depicting numerous categories. The Qi-Framework embraces compositional and operational comparisons to aid low-level method comprehension, but differentiates itself by pursuing abstract mathematical syntax (i.e. rather than comparative terminology) so researchers may use the Qi-Framework to pursue quantified comparisons of diverse XAI method types.
Generality of application: Existing XAI frameworks may either apply to individual or general types of explanation. For example, when presenting the LIME and MAPLE methods, an explanation is defined a model from a class of interpretable models (e.g. linear models, decision trees, etc.) that approximates complex model behaviors (Ribeiro et al. 2016; Plumb et al. 2018). Such bases of explanations are technically rigorous, but not general. Many explanations are not model-based, including textual explanations from language models, additive index scores (Staniak and Biecek 2018), partial dependence plots (Szepannek and Lübke 2023), and contrastive attributions (Jacovi et al. 2021). Even standard tables, graphs, and visualizations are basic explanations of model behaviors, as in work by Chen et al. (2020) which demonstrates the concept whitening approach by presenting images in their dataset that maximally activated certain function components. Such visualizations are not formal explanation methods, but certainly help explain model behaviors. The Qi-Framework includes syntax that seeks to apply generally to many explanation types while remaining useful for expressing individual methods.
Historical and future relevance: Terminology presented in XAI frameworks may remain relevant to historical XAI methods or robust to future change. Existing frameworks often define notation that statically describes the space of explanation types and methods. For example, describing XAI methods as post-hoc or inherently explainable holds latent ambiguity in some scenarios. Loss curves explain of model performance through training, but are neither post-hoc nor inherently interpretable approaches - they explain the training process itself. This example illustrates scenarios where existing notation may fail to categorize certain explanation types. In these conditions, researchers may need new notation to describe their methods, which can break existing taxonomies. But creating disparate terminology fractures common schema for method categorization. Syntactic programming languages, on the other hand, illustrate dynamic notation for representing general variable classes that are extensible to defining many formal programs. While the terminology of prior works characterize existing methods (Dwivedi et al. 2023; Chromik and Schuessler 2020) and only conditionally remains relevant to future methods, the Qi-Framework pursues a common syntax that researchers and practitioners can instantiate to describe unseen explanation types, methods, and data needs.
Descriptive and generative utility: Prior frameworks often describe existing methods, rather than providing explicate means of generating new ones. In business, whitespace-analysis methods reveal product gaps by asking: “What combination of needs and methods have not been pursued?"Surveys and taxonomies that tabularly or graphically compare XAI methods (Dwivedi et al. 2023) instead provide an understanding of method groups. The differential value in these approaches is helping researchers and practitioners identify information that remains unexplained. Abstracting syntax for the Qi-Framework from compositional and operational subcomponents lets researchers use tabular analysis to reveal what information is not explained by existing methods, thereby revealing novel explanatory approaches.
Mathematical backing: Existing frameworks often frequently omit mathematical notation and syntax which helps quantitatively compare XAI methods. This shortcoming is observed by Nomm (2023), who pursued a mathematical framework for XAI by providing a basic taxonomy derived from linear algebra concepts, such as a models’ decision boundaries, point sets, and decision traces. However, this work is limited to a small set of examples and does not provide a general syntax. The Qi-Framework provides such syntax for applying quantitative metrics more generally. For example, work by Plumb et al. (2018) presents metrics for comparing explanations (e.g. the “causal local explanation metric"), and Ribeiro et al. (2018) gauges user comprehension via user studies. Both studies apply their metrics in specific use cases. The Qi-Framework’s syntax exploits parallels across quantitative approaches so specific metrics may be applied more generally.
Table 1. Novelty of this paper as compared to existing conceptual Frameworks (F) for XAI, including XAI method Taxonomies (T), surveys (S), or individual Method papers (M)
Research effort | Framework type | Compare Cat. a | Compare Op. b | Compare Comp.c | Historical Rel.d | Future Rel.e | General App.f | Limited App.g | Descriptive Utilityh | Generative Utilityi | Maths– Groundedj | Abstracts Syntaxk |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
Qi– Framework (This paper) | F | – | – | |||||||||
Minh et al. (2022) | S | – | – | – | – | – | ||||||
Adadi and Berrada (2018) | S | – | – | – | – | – | – | – | ||||
Belle and Papantonis (2021) | S | – | – | – | – | – | – | – | ||||
Burkart and Huber (2021) | S | – | – | – | – | – | ||||||
Chromik and Schuessler (2020) | T | – | – | – | – | – | – | – | ||||
Nomm (2023) | T | – | – | – | – | – | ||||||
Doshi– Velez and Kim (2017) | T | – | – | – | – | – | – | – | – | |||
Schwalbe and Finzel (2023) | T | – | – | – | – | – | – | |||||
Speith (2022) | T | – | – | – | – | – | – | |||||
Emamirad et al. (2023) | T | – | – | – | – | – | – | – | ||||
Chromik and Schuessler (2020) | T | – | – | – | – | – | – | – | ||||
Dwivedi et al. (2023) | T | – | – | – | – | – | ||||||
Ibrahim and Shafiq (2023) | T | – | – | – | – | – | – | – | ||||
Arya et al. (2019) | T | – | – | – | – | – | ||||||
Das et al. (2020) | T | – | – | – | – | – | – | – | – | |||
Ribeiro et al. (2016) | M | – | – | – | – | – | – | – | – | – | ||
Chen et al. (2018) | M | – | – | – | – | – | – | – | – | – |
The definition of criteria for each comparative term is provided beneath the table
Compare Category (Cat.): Categories of XAI method properties are used for method comparison
Compare Operation (Op.): Procedural subcomponents are used to compare XAI methods
Compare Composition (Comp.): Mathematical subcomponents aid XAI method comparisons
Historically Relevant (Rel.): Framework does not establish a protocol for future adaptation
Future-Relevant (Rel.): Framework provides adaptive notation for describing future methods
General Applicability (App.): Framework has been applied generally to many XAI method types
Limited Applicability (App.): Framework has been applied to one or several XAI method types
Descriptive Utility: Framework provides a classification schema for describing XAI methods
Generative Utility: Framework provides a methodology for discovering new XAI methods
Maths-Grounded: Framework provides mathematical notation describing XAI characteristics
Abstracts syntax: Framework provides abstract syntax for describing explanation types
Novelty claims
Without diminishing the value of existing works, the Qi-Framework is the first to abstract a mathematical syntax from XAI method subcomponents to help researchers and practitioners describe and generate explanation types and methods in a general format that remains relevant to both historical and future approaches in XAI. These claims are as discussed in the following bullet points:
The Qi-Framework is the first to abstract mathematical syntax from XAI method subcomponents. While prior methods provide valuable mathematical foundations for explanations, they are not general across many explanation types. The Qi-Framework starts from existing notation, but then abstracts a general syntax for describing many explanation types in terms of: (1) The information being explained; (2) The information and method used to produce an explanation; and (3) The varying utility of explanations in helping inspectors understand model behaviors. The value is in providing several variable types and instances to help describe, compare, and explore explanation methods while outlining a path for researchers to generally apply existing quantitative metrics developed in XAI.
The Qi-Framework is the first framework to help generate explanation types and methods. While prior frameworks describe existing methods, the Qi-Framework may be used in tabular analysis to reveal information that is not explained by existing methodologies, thereby providing a white-space analysis that identifies gaps in existing methodologies to help consider: (1) What information is not commonly explained by existing methods; (2) What function subcomponents are frequently used to produce interpretable models; and (3) What combination of function subcomponents may be used to create “fully interpretable"model architectures. These traits help researchers consider and develop new explanation types and methods.
The Qi-Framework is the first generally applicable framework that remains relevant to both historical and future XAI methods. Abstracting a syntax from method subcomponents that represent common model inputs, function components, and outputs allows the Qi-Framework to generalize across many explanation types. This syntax helps the Qi-Framework remain robust to future development. Just as variables in math can take many values, researchers can instantiate the Qi-Framework to report unique method types and properties that emerge over time.
Methods
The methodology for abstracting a common syntax as the Qi-Framework is illustrated in Fig. 3. Diverse XAI publications were first identified from existing literature and GitHub repositories (Sect. 3.1). Each publication contained either one or more explanation types that explained unique model behaviors, representing a set of 184 explanations (Sect. 3.2). Decomposing these explanations revealed common method inputs, outputs, function components, and secondary metrics that are used to construct existing XAI methods (Sect. 3.3). These subcomponents were grouped to produced an abstract syntax that describes the information being explained by and needed to produce diverse explanation type (Sect. 3.4).
[See PDF for image]
Fig. 3
Graphical illustration of the methodology for a identifying XAI methods from literature, b decomposing functions into primary inputs, functions, and outputs, and c abstracting general syntax representing many XAI method and explanation types
Literature review
The literature review identified 66 papers describing common XAI methods that were later decomposed to abstract a general expression for explanation types and methods. These papers were found from XAI surveys and GitHub repositories. The surveys were found by searching for terms like “XAI survey,"“XAI taxonomy,"and “XAI literature review"before selecting articles from Google Scholar with high citation counts. We identified GitHub repositories by searching for terms like “XAI"and “interpretable machine learning"on GitHub and identifying popular repositories as ranked by the number of stars on GitHub. The papers compiled from these sources were published between 1991 and 2022 (mean publication data of 2018), and generally have high-impact XAI methods (2208 citations from Google Scholar on average, at the time of writing). This approach biases the publication list toward XAI methods that are both common and implemented in software. The diversity of the XAI methods is summarized in Fig. 4, which overviews inherent explainability and post-hoc XAI methods which are divided into four sub-categories.
Figure 4a depicts inherently explainable methods that focus on two main distinctions: structural explainability, and training-based explainability:
Structural explainability (Fig. 4b) changes function structures and latent space dimensionality (Fig. 4c and 4d, respectively) so they are easier to understand. Interpretable models based on linear regression or decision trees are often considered explainable because the logic and contribution of individual terms can be understood. Other approaches using semantic bottlenecks and prototype layers constrain the meaning of latent space variables, making them easier to conceptualize and visualize. Other approaches use interpretable embeddings so functions learned by a model are easier to understand, such as forcing neural networks to perform like logical operators.
Training-based explainability (Fig. 4c) enforces constraints on the loss function so the latent space is more interpretable. For example, forcing a network to learn features that are distant from or orthogonal to each other in the latent space lets functional nodes and latent-space clusters represent dissimilar patterns (e.g. different shapes, colors, textures, etc.). This concept parallels metric learning (Ghojogh et al. 2022) which can improve latent-space interpretability. Other generative models guide latent-space interpretability by using reconstruction losses to generate inputs from latent variables.
Input saliency (Fig. 4e) highlights important model inputs and how they influence a prediction (Fig. 4b, c, and e). For example, approaches like LIME, SHAP, and ANCHORS perturb or occlude model inputs and observe the change in a model’s prediction. Other saliency approaches trace how information flows through networks to influence predictions. These methods often result in saliency maps that indicate important model inputs.
State activation methods (Fig. 4f) interpret or utilize a models’ latent representations (Fig. 4c and d). For example, patterns of latent-variable change in a model for different input classes, similar to how unique neuron groupings activate in the human brain under varying stimuli (Perin et al. 2013). XAI methods use this observation to interpret feature clusters in the latent space, identify inputs that optimize latent variable response, distinguish patterns of latent-space variable response, and build reduced order models that simplify latent space relationships.
[See PDF for image]
Fig. 4
Summary of XAI method types from the literature reveiw. The methods reviewed are either inherently explainable, or use post-hoc method, each with many sub-categories. a Inherent explainability methods are divided into, b structural and c training-based methods. d Post-hoc explainability is generally decomposed into, e input-salience methods, and f state-activation based methods. Further subdivisions are present for each category
When reading articles introducing XAI methods, these works often cited prior methods for comparison. Any time prior works were not in our list of papers, they were included to expand the scope of our review. The full set of resulting papers were then used to identify the explanation types contained in each publication.
Method for identifying explanations from publications
The Qi-Framework seeks to generally describe the explanations from each publication found in the literature review. These publications often contained one or more explanations of model behaviors (per Fig. 3a). For example, the ProtoTree method represents a collection of explanation types, where model inspectors may interpret individual prototypes or concepts learned by the network, the logical decision of nodes in a decision tree, and the sequence of decisions made when performing classification. Here, authors sometimes explicitly mention what information is explained by their method, but other times do not provide a complete list of explanation types. These secondary explanations are latent to an explanation method. Another paper by Chen et al. (2020) presents a single method (i.e. the concept whitening method), but illustrates the impact of concept whitening by associating function nodes in a neural network with interpretable images. Such graphical depictions are also secondary explanations that aid inspectors’ comprehension, though they are not the main focus of associated papers. We identified explicate, secondary-latent, and secondary-graphical explanations by answering the following questions:
Explicate explanations: What information or parts of a model do authors mention their method can explain?
Secondary-latent explanations: What set of information (e.g. important inputs, active function subcomponents, IDs of latent-space clusters, etc.) could a model inspector reasonably predict after reviewing model parameters, model surrogate, or derivative explanation? These explanations are determined empirically, and are not listed by paper’s authors.
Secondary-graphical explanations: What figures or tables used to explain model behaviors were presented that aid comprehension of model behaviors?
Method for decomposing individual explanations
Decomposing the methods used to produce each of the 184 explanations revealed common “information subcomponents"used in explanatory procedures, including method inputs, outputs, intermediate function subcomponents, and derivative metrics (e.g. additive index scores, detector uniqueness, and prototype similarity, as discussed in Sect. 4). The reviewers generally answered the following questions to identify the subcomponents needed to produce explanations:
What input types were used to produce explanations? Examples include individual inputs, inputs sets, concept datasets, latent space identification characters (ID), etc.
What is the data type of the returned explanation? Examples include reconstructed inputs, input masks and patterns, decision traces, contrastive attributions, etc.
What function components were responsible for producing an explanation?2 Examples include linear layers, prototype layers, semantic bottlenecks, neural networks, neural network nodes, etc.
What subsidiary information is needed to produce an explanation? Examples include similarity scores, additive index scores, saliency maps, etc.
What sequence of operations is used to map XAI method inputs to given explanations?
Methods and mathematics from abstracting XAI syntax
Patterns in the XAI methods were abstracted into a mathematical syntax for generally expressing diverse explanation methods and types - this section reviews both the abstraction process and the resulting mathematics. Figure 5 describes the iterative process of abstracting a common mathematical syntax, which generally involved grouping the XAI method subcomponents from Sect. 3.3 into functional expressions that held the same abstract syntax across each method. When the found syntax failed to express all 184 explanations, new notation was conceived to denote the differences. Practically, the process in Fig. 5 was repeated four times until the found syntax described all methods. While empirical, this process is grounded in the diverse set of mathematical subcomponents present in each XAI method. Consequently, the resulting syntax is immediately useful in describing and comparing existing XAI methods while providing paths toward quantified comparisons across diverse XAI methods (see Sect. 5). The following paragraphs describe the mathematical syntax found in review.
[See PDF for image]
Fig. 5
Graphical process of abstracting a general syntax for XAI, starting from existing syntax published in literature, modifying this syntax when describing existing methods, and resulting in a general syntax for describing many explanation types and methods
Figure 6 shows the model underlying the Qi-Framework which was discovered in review. The figure illustrates a notional process that connects inspectors’ questions to comprehensible and relevant explanations. The left side of this figure illustrates how an inspector I(x) (Fig. 6a) may ask different kinds of questions Q depending on their role and use case (Fig. 6b). The Qi-Framework assumes that inspector’s questions can be mapped to an explanation e, which is an instance of a given explanation type . As observed in review, explanation types vary the type of information being explained, or i, and the utility an explanation provides to an interpreter, or u. Explanation types are therefore denoted as . Explanations may be mapped to Q via a “question mapping"which is responsible for selecting explanations that aid user comprehension (Fig. 6d)3. Because there are many kinds of question mappings, inspectors may have to form explicate queries to search for answers (Fig. 6c). Many explanation types and formats may be produced by explanation methods (Fig. 6f), such as saliency maps, textual answers, or partial dependence plots (Fig. 6e), which each depend on unique sets of information, or s, that are derived from a model system, or F(x)4. Conveniently, Eq. 1 simply summarizes the model underlying the Qi-Framework, which is general with respect to the model type, inspector type, and explanation types, methods, and formats:
1
While general, the scope and intricacies of , , and s, need to be defined. Practically, is an instance from the set of XAI methods found in the review, u is defined empirically because sometimes the same information helps an interpreter predict different characteristics about model systems5, and i and s require further discussion. Here, we introduce as the parent set of XAI-Relevant information, which comprises sets of spatial (), functional () and training-related () information related to a machine learning model F(x). In other notation, , where is a single type of information used by an XAI method, and s is a set of such that . Note that can both be explained, as in the expression , and used to explain other types of , as in s. Specific values of depend highly on the XAI method as , , and are enormous, as visualized in Fig. 7.[See PDF for image]
Fig. 6
Qi Framework for mapping an inspector’s Questions to explanation types that explain Information about a machine learning system. This model assumes that (a) inspectors with (b) underlying questions about a (h) machine learning system may (c) ask or relay a question which is then answered via (e) an explanation of a machine learning system. In the Qi-Framework, (g) information from machine learning systems is derived or formatted using (f) an XAI method to produce explanations. Here, the goal of the (d) question mapper, which may be a person, program, or model, is to provide explanations that best aid inspectors’ comprehension
[See PDF for image]
Fig. 7
Information types decomposed from a model’s life cycle, which focuses on (a) a models training routine. Though dependent on the model type, model training routine generally comprises (b) training information and c model information. Training information represents the learning algorithm, training schedule, hyperparameters, and a loss function. Models selected from the function space generally comprise a corresponding d input space, e latent space, and f output space. , , and may be further decomposed. The function space may be decomposed into function types. The input, latent, and output spaces may be decomposed into individual samples, data clusters, trajectories, datasets. Training information may be categorized into many ways, though this paper focuses on the learning algorithms, , and loss functions, . g Secondary information may be derived from the , , and types to aid explanation
is the full set of function components that may be explained by XAI methods, where subcomponents in are hierarchical and compositional. This work considers that functions types () comprise a family of function systems (), which may each be decomposed into function composites (), function elements (), and function primitives (). These sets are described in the caption of Fig. 8, and represent the full set of function components such that . Please note that Fig. 8 represents a small set of function components found empirically in review.
is the set of training information that is relevant to XAI methods. While diverse, this paper focuses on the learning algorithm () and the loss function (). Figure 8b shows relevant sets of and found in review, though further study is needed to characterize whether certain training schedules and hyperparameters may be designed to influence a model’s resulting explainability.
is the set of spatial information related to a model, and generally considers a model’s input space (X), latent space (Z), and output space (Y), though XAI methods often involve secondary spaces (W6) which are derived from X, Z, Y, , and . One challenge in abstracting a general syntax for XAI is the diversity of information types used and explained from . For example, explanations often consider: (1) The full set of points defined in a vector space, or S; (2) A dataset of observations sampled from a vector space, or ; (3) The set of basis vectors defining a vector space, or ; and (4) Points sampled along individual basis vectors, or . While these four examples were listed due to their generality, subsets of information are also relevant. XAI methods also use and explain datum instances (), perturbations from datum instances (), datasets (), data clusters (Cx, Cz, Cy), and spatial trajectories (Tx, Tz, Ty). Such diversity implores whether any set of information that is well defined could be used to produce standardized explanations.
Building from the prior observation, a general syntax for XAI should quell the diversity expressed in XAI methods. Equation 1 pursues this ambition, allowing i and s to describe many information types by defining new notation and leveraging existing notation from set theory and other mathematics. For example, we introduce the notation to succinctly define i from Eq. 1, though sometimes is used when referencing the full ID set. Here, J {X, Z, Y, W, , , } and defines a type of information being referenced, while depends on J to define the information type and subtype as follows:
2
The “ID"symbols are the indices from Fig. 8 when , and are the indices from Table 2 otherwise. These indices show the type of information being used from J. For example, when describing a data “instance” (ID= 1 from Table 2) of a dataset () from the input space (X), we would write . Similarly, represents a set of dimensions included from the latent space, whereas represents the concept alignment loss function from Fig. 7. Note that each ID does not apply to every , as shown in the rightmost column of Table 27. Furthermore, this table uses to describe elements in a given set , while instances of represent integers that change the set size ad described in beneath the table. Together, this syntax represents the Qi-Framework, which is the first framework known to the authors that generally describes diverse sets of XAI methods. This framework was applied to study, summarize, and compare the set of 184 explanations found in review. The following sections demonstrate the value of this syntax in describing (Sects. 3.5) and qualitatively comparing (Sect. 4) XAI methods, though applications include generating and quantitatively comparing XAI methods (Sect. 5).Table 2. Subsets of information used by XAI methods found in this review which can be derived from (, , S, ) and ( represents any kind of function component)
ID | Set type name | Set definition | Max # elements | Validtypes |
|---|---|---|---|---|
1 | Instance | 1 | ||
2 | Perturbed Instance | 1 | ||
3 | Instance Comparison | 2 | ||
4 | Inclusion Set | |||
5 | Perturbed Set | |||
6 | Cluster Set | |||
7 | Included Range | |||
8 | Exclusion Set | - | ||
9 | Full Set | |||
a | Projected Set | |||
b | Comparison Set (k= 2) | |||
c | Permutation | |||
d | Trajectory | |||
e | Power Set | |||
f | Human Factors |
The ID and name of each information subset are listed in the first and second columns, respectively. The set definition and maximum theoretical size are also listed
In these definitions, describes elements of a set ,defines a small perturbation from a given , and instances of are variable integers that change the size of a set according to the descriptions listed beneath the table
Note each set type applies to a limited number of information types, as listed in the “Relevant Types"column
For example, XAI methods may perturb an input sample from , though perturbing and forming trajectories of elements in is ill defined1
Instance: A single element selected from
2Perturbed Instance: An element from that is perturbed by distance (often noise)
3Instance Comparison: Two elements and from
4Inclusion Set: elements from , where is often a user-defined size
5Perturbed Set: elements from that are perturbed by a corresponding distance (often noise)
6Cluster Set: elements from , where is the number of elements in a data cluster
7Included Range: elements from , where falls between an upper ( ) and lower () bound
8 Exclusion Set: A full set of elements where elements have been removed
9Full Set: A full set of elements contained in
Projected Set: A full set that is projected to a subspace determined by a projection operator
Comparison Set: The set of pairwise combinations between each element in a given
Permutation Set: The full set of permutations from each element in a given
Trajectory: A set of sequences with elements selected from a given
Power Set: The full set of subsets from a given
Human Factors: A set of human-curated labels assigned to elements in a given from a given
[See PDF for image]
Fig. 8
The function types and subcomponents found in review, where is decomposed into that act as the building blocks for , , and . The set of include basic arithmetic operations, but are extended to include trigonometric functions, common activation functions for neural networks, similarity functions, and distance functions. generally combines elements from into that perform basic logical, comparative, linear, or non-linear operations. groups elements from to perform more complex operations, such as OR and XOR logical gates, convolutional filters that perform positional encoding, or whitening modules that aid latent space interpretability. represents a vast set of deployment-ready models built from individual , including neural networks variations, individual neural-symbolic or causal methods, and trees of linear models. There are many function types () that group variations of system architecture, scale, function, and application. The number after each component represents the component ID used for later reference
Method to apply the Qi-framework syntax
We may apply Eq. 1 with the information in Table 2 and Fig. 8 to describe XAI methods, as is summarized and used to provide several examples in this section. The approach is to evaluate the characteristics of XAI methods via several steps. While these steps are descriptive in nature, the type and effect of individual explanations may be quantified as discussed in Sect. 5.
Step 1-: Identify what kind of information is being explained by an explanation produced by an XAI method. An inspector’s comprehension about information type should be improved. Table 2 and Fig. 8 represent example sets of information that may be explained, though many unique information types may not be represented.
Step 2-: Identify what utility an explanation provides to an inspector. For example, find whether an explanation reveals the importance, representation, or causation about information type . (see Sect. 4.2).
Step 3-: Identify the name of the XAI method being used to produce the explanation. Often XAI methods are explicitly named in publications, such as ANCHORS (Ribeiro et al. 2018), DataSHAP (Ghorbani and Zou 2019), ProtoTree (Nauta et al. 2021), etc. However, there are primitive explanation types which may not be specified by an XAI method, such as showing the association or comparison of different information types (see Sect. 4).
Step 4-s: Identify the set of information being used to recognize or derive the information presented as an explanation. Example information sets are depicted in Table 2.
This process may be used to describe existing XAI methods in the language of the Qi-Framework. We illustrate the versatility of this approach by describing common XAI methods from literature including LIME, SHAP, Integrated Gradients, Concept Whitening, and ProtoPNet, though the results from a more exhaustive tabular analysis are described in Section A. Note that in this discussion we use the notation :
LIME (Ribeiro et al. 2016): LIME is a post-hoc method that uses a model’s input (), a set perturbation of this input (, and the set of outputs perturbed from the initial output (), and then explains the model by producing a linear surrogate of a machine learning model () that weights the importance of each input value. Associating the weights () of this linear surrogate with the full set of input dimensions () explains the importance of each input dimension in producing the current output (:
3
4
Linear-SHAP (Štrumbelj and Kononenko 2014): The Linear-SHAP method is a post-hoc method that interprets the weights of linear models () as Shapley Values. Associating model weights with each input dimension () explains their relative importance () in a prediction:
5
Integrated gradients (Sundararajan et al. 2017): Integrated Gradients (IG) is a post-hoc method that defines a trajectory in the input space () as a linear path from a given input to a secondary baseline. Summing the function gradient, or the set of output perturbations () for each input dimension () produces an importance score for each dimension ():
6
Concept whitening Chen et al. (2020): Concept whitening is a method for inherent explainability which uses a concept whitening module () and a concept alignment loss () to decorrelate and mean-center predictions in the latent space (), thereby promoting interpretability of each point in the latent space () via training. Neural networks trained using concept whitening tend to have neurons () that represent explainable input patterns (). Inputs from the training dataset () that maximize the latent response () of certain help reveal their latent representations.
7
8
ProtoPNet (Chen et al. 2019): ProtoPNet is an inherently interpretable model which learns prototypical concepts encoded as prototype vectors () and explains the similarity () between learned concepts () and a subregion () of the current input (). Important inputs may be explained () by associating each input dimension () with the predicted similarity scores. This explanation also reveals what concepts prototype vectors represent () by associating each vector with important inputs. Because ProtoPNet predicts the class outputs () as a linear combination of interpretable prototypes, it is easy to explain what predicted classes represent () in terms of interpretable features by associating output classes () with weights from the linear model (). Individual predictions may be justified () by observing what prototypes were activated in the latent space () and their relationship to certain class predictions.
9
10
11
12
13
Results
To justify the generality, mathematical foundations, and generative capability of Eq. 1, this section presents a series of results summarizing the process of abstracting the Qi-Framework. Toward this end, the following sections show how Eq. 1 helps researchers and practitioners communicate the unique information sets (s) used to produce explanations (Sect. 4.1), describe types of in terms of their utility and the information they explain (Sect. 4.2), and contemplate both formal and primitive methods for producing explanations (Sect. 4.3). Together, these sections demonstrate the generality of the Qi-Framework in summarizing existing explanation types and serving as groundwork for generating novel explanations (Sect. 5).
The mathematical subcomponents of explanations: “s"
Whereas prior taxonomies and frameworks provide a static vocabulary to describe unique method characteristics, the general approach illustrated in Table 2 encourages XAI researchers to describe their methods’ data needs. Where the vocabulary of Table 2 is insufficient, researchers and practitioners can define the unique information sets used to produce explanations, thereby helping the Qi-Framework remain relevant to future approaches. Furthermore, defining this vocabulary mathematically simplifies quantitative comparisons of diverse XAI methods (Sect. 5), which is discussed after addressing how frequently information sets are used to produce explanations (see Sect. 4.1.1 - Sect. 4.1.3).
Frequency of subcomponents: input, latent, and output-space
XAI methods commonly use similar types of to producing explanations, as summarized in Table 3. This table shows that researchers more commonly explain limited sets of information (e.g. instances and perturbed instances) rather than large sets of information (e.g. comparison sets, trajectories, etc.). Of the 278 information sets used to produce explanations, a little over 44% (123 information sets) were produced using single instances of information (e.g. individual inputs, points in the latent space, or output predictions), which likely reflects some focus on local explanation methods. These methods differ from comparison-based methods which rely on at least two elements from the set , as is needed for contrastive attribution (Jacovi et al. 2021). Some approaches rely on full-sets of information, such as Blur-IG which shows the importance of the full set of input basis vectors, DataSHAP which relies on a full dataset, and Testing with Concept Activation Vectors (TCAV) which maps a full concept-dataset to the latent space to help find latent-space concepts (Kim et al. 2018). SHAP-based methods consider the power set of input dimensions, though in practice Shapley sampling values or alternate sampling methods reduce the number of samples used (Lundberg and Lee 2017). Note that the basis vectors of input, latent, and output spaces are often associated with human-interpretable labels to aid comprehension (Chen et al. 2020; Zhang et al. 2019).
Table 3. Summary of the specific types of () that types of identified from the literature review used to produce explanations
# Instances of information used to produce explanations | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Information type ised | Information set used | |||||||||||||
Name | ID | Sum: | ||||||||||||
Instance | 1 | − | 1 | 43 | 30 | 4 | − | 19 | 5 | 12 | 9 | 11 | 8 | 123 |
Perturbed Instance | 2 | − | − | 3 | 1 | 1 | − | 5 | 2 | − | − | 3 | 3 | 12 |
Instance Comparison | 3 | − | − | 3 | − | − | − | − | − | − | − | 1 | − | 3 |
Inclusion Set | 4 | − | 13 | − | 8 | − | 16 | 12 | 3 | − | 1 | 2 | − | 53 |
Perturbed Set | 5 | 2 | − | 5 | − | − | − | − | − | 2 | − | 14 | 1 | 7 |
Cluster Set | 6 | − | − | − | − | − | − | 2 | − | − | − | − | − | 2 |
Included Range | 7 | − | − | − | − | 1 | − | − | − | − | − | − | − | 1 |
Exclusion Set | 8 | − | 3 | − | 1 | − | − | − | − | − | − | − | − | 4 |
Full Set | 9 | − | 32 | 1 | 8 | − | − | 1 | 2 | − | − | − | 5 | 44 |
Projected Set | a | − | − | − | − | − | − | 1 | − | − | − | − | − | 1 |
Comparison Set | b | − | − | − | − | − | − | − | − | − | − | − | − | − |
Permutations | c | − | 2 | − | − | − | − | − | − | − | − | − | − | 2 |
Trajectory | d | − | − | 3 | − | − | − | − | − | − | − | − | − | 3 |
Power Set | e | − | 5 | − | − | − | − | − | − | − | − | − | − | 5 |
Labels | f | − | 2 | − | 2 | 1 | 4 | − | 5 | − | 4 | − | 6 | 18 |
Sum: | 2 | 58 | 58 | 50 | 7 | 20 | 40 | 17 | 14 | 14 | 31 | 23 | ||
The information types and IDs correspond to Table 2 and help specify the number of times each type in was used to produce explanations
For example, 43 explanations were produced using, in part, individual sample instances from the input space ()
Note that each often uses multiple information types to produce explanations
Summing counts per row and column reveals the number of times types and information set types are used to produce explanations, respectively
Frequency of subcomponents: function components
Understanding what function components are frequently used to produce explanations helps researchers understand common trends and develop next-generation models for XAI. Table 4 summarizes the function subcomponents used to produce explanations. These components were are either integrated into function structures when building inherently interpretable methods, or act as inputs to post-hoc XAI methods. Each scale of function component from Fig. 8 is represented, except for function types which are generally too broad to act as inputs to explanation methods. XAI methods often use deep neural networks (DNNs, from ) to produce model surrogates, and CNNs are often used to extract visual patterns learned by convolutional filters (Zhang et al. 2019). However, researchers often focus on more inherently interpretable approaches.
Table 4. Summary of the specific types of information from () that were explained (# Explanations) and used to produce explanations (# Info. Uses) by the types of identified from the literature review
Type: | Tree of linear models | And–Or–graphs | Logic–based networks | Convolutional networks | Graphical networks | Decoder networks | Multi–Layer perceptron | Row sum | ||
|---|---|---|---|---|---|---|---|---|---|---|
ID: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | – | – | |
# Info. Uses: | – | – | – | 7 | 1 | 6 | – | – | – | 14 |
# Explanations: | 1 | 2 | – | – | 3 | 1 | 1 | – | – | 8 |
Type: | Decision Tree | Logic Gates and Rules | Convolutional Filters | Graphical Filters | Capsule Encodings | Whitening Modules | Semantic Bottlenecks | Linear Layer | Decision Trace | Row Sum |
|---|---|---|---|---|---|---|---|---|---|---|
ID: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
# Info. Uses: | 1 | – | 7 | – | 1 | 1 | 1 | 3 | – | 14 |
# Explanations: | 4 | – | 4 | – | – | – | 2 | – | – | 10 |
Type: | Quantifiers | Predicates | Propositions | Prototype Vectors | Concept Vectors | Network Nodes | Linear Models | Row Sum | ||
|---|---|---|---|---|---|---|---|---|---|---|
ID: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | – | – | |
# Info. Uses: | – | 2 | – | 18 | 14 | 18 | 3 | – | – | 55 |
# Explanations: | 1 | 2 | 1 | 8 | 5 | 7 | – | – | – | 24 |
Type: | Probability Parameters | Boolean Variables | Parameter Biases | Parameter Weights | Similarity Functions | Activation Functions | Branch and Leaf Nodes | Row Sum | ||
|---|---|---|---|---|---|---|---|---|---|---|
ID: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | – | – | |
# Info. Uses: | – | 5 | – | 18 | 3 | – | 7 | – | – | 33 |
# Explanations: | – | 2 | – | – | – | – | 1 | – | – | 3 |
The information types and IDs correspond to Fig. 8 and help specify the number of explanations types corresponding to each function information type in
For example, two explanation types identified provided inspectors an And-Or-Graph function system () as an explanation
In addition, seven explanation types used prototype-vectors from when producing explanations
Inherently interpretable models often leverage function components from , like the ProtoPNet family of models which frequently use prototype or concept vectors that learn interpretable latent space concepts (Singh and Yow 2021b). Linear layers help justify predictions when their inputs are interpretable, as when prototype vectors (Li et al. 2018; Singh and Yow 2021a) or class activation mappings (Zhou et al. 2016) are used to train interpretable linear predictors. Other post-hoc approaches use outputs from individual (esp. nodes from neural networks) to help explain the broader behavior of larger models (Sabour et al. 2017). Note that basic primitives like interpretable boolean parameters, branch and leaf nodes, and logical functions often form the basis of many interpretable models (Dong et al. 2019; Nauta et al. 2021).
These results highlight several building blocks for next-generation interpretable models. A more valuable lesson is considering whether the process of developing fully interpretable models may be viewed as finding the right combination of explainable function subcomponents. Resources like Table 4 help researchers identify function components that both aid model performance and the computation of downstream explainability metrics (Sect. 4.1.3).
Frequency of subcomponents: derived information & metrics
The Qi-Framework helps express the information needed to derive secondary metrics for explaining model behaviors. Table 5 shows a list of derived metrics identified in review, and details the information needed to compute each metric. Note how some metrics use function-related information from , such as the detector uniqueness (Zhou et al. 2018), channel-wise increase of confidence (Wang et al. 2020), and the prototype similarity (Chen et al. 2019). Other metrics depend on information from , such as the additive feature attribution and TCAV scores. Many of these metrics are used as inputs for downstream explanations, and may be expressed as dependencies by including them in s. Other times, these metrics act as explicate explanation types.
Table 5. Summary of the information types used to derive secondary information ()
Composition of derived explanation metrics | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Derived information type | Information set | ||||||||||||||||
Name | ID | ||||||||||||||||
Additive feature attributions | 1 | – | – | 5 | – | – | – | – | – | – | – | – | 5 | – | – | – | – |
Channel–wise increase of confidence | 2 | – | – | – | 1 | – | – | 1 | – | – | – | – | – | – | 4 | – | – |
Completeness score | 3 | – | – | – | 9 | – | – | – | – | – | – | – | 9 | – | – | – | 5 |
Contrastive measure | 4 | – | – | 1 | – | – | – | a | – | – | – | 3 | – | – | – | – | – |
Detector uniqueness | 5 | – | – | – | f | – | – | – | – | – | – | – | – | – | – | 3 | – |
Importance score | 6 | – | – | 3 | – | – | – | – | 2 | – | – | – | 2 | – | – | – | – |
Integrated gradients | 7 | – | – | d | 1 | – | – | – | – | – | – | 5 | – | – | – | – | – |
IOU score | 8 | – | – | – | f | – | – | – | f | – | – | – | – | – | – | 3 | – |
Model accuracy | 9 | – | – | – | 8 | – | – | – | – | – | – | – | – | 1 | – | – | – |
Mutual information | a | – | – | – | 9 | – | – | – | – | – | – | – | 9 | – | – | – | – |
Function gradient | b | – | – | 2 | 1 | – | – | 2 | – | 1 | – | – | – | 4 | – | – | – |
Pairwise similarities | c | – | 8 | – | 1 | – | – | – | – | – | – | – | 1 | – | – | – | – |
Prototype similarity | d | – | 4 | 1 | – | – | 4 | 1 | – | – | – | – | – | – | – | – | 4 |
Relevance | e | – | – | 1 | – | – | – | – | – | – | – | 1 | – | 7 | – | – | – |
SAGE values | f | – | – | 1 | – | – | – | 1 | – | – | – | 1 | – | 3 | – | – | – |
Sensitivity | g | – | – | – | 2 | – | – | – | – | – | – | – | 2 | 4 | – | – | – |
Shapley regression values | h | – | e | 1 | – | – | – | – | – | – | – | 5 | – | – | – | – | – |
Shapley samplig values | i | – | – | 1 | 9 | – | – | – | – | – | – | 5 | – | – | – | – | – |
Shapley values | j | – | c | – | 1 | – | – | – | – | – | – | – | 1 | 7 | – | – | – |
TCAV score | k | – | – | – | 9 | – | – | 9 | – | – | – | – | – | – | – | – | – |
The row headers list the types of derived information identified in the white-space analysis, and the column headers list the information type used to produce each metric
This table defines the ID of each derived metric (), which are referenced in the Appendix
Each table value lists the ID of spatial or functional information type being used from Table 2 and Fig. 8, respectively
For example, the Detector Uniqueness () is calculated using labels from dataset samples () and a set of convolutional filters ()
General syntax for explanation types: “"
Numerous explanation types, or , may be expressed in terms of the information they explain (or i) and the utility they provide to model inspectors (or u). This notation was abstracted empirically by recognizing that two explanations may help inspectors interpret different aspects of the same information. For example, Kim et al. (2021) reveal what concepts are learned by individual prototype vectors, while Yeh et al. (2020) rank the completeness of concepts in making a prediction. Both methods explain individual , but the former reveals a function’s representation, and the latter highlights a function’s importance. This distinction demonstrates the differing utility of explanation types to model inspectors. Explanations were found to be useful in eight ways, either revealing information’s representation, justification, importance, logical clarity, reduced dimensionality, or providing surrogates, examples, and counterfactuals of a given information type (see Table 6 for definitions). While information may be useful in other ways, this set categorized all 184 explanations (representing 30 unique explanation types) found in review. This notation therefore provides immediate value in method description, and can be used to both summarize existing work and reveal potential research directions for XAI.
Table 6. Summary of how types of identified in the literature review explain different information types () and provide variable utility (u) to inspectors
Explanation specifier | Rep.h | Just.d | Imp.c | Logice | Red.f | Surr.i | E.G.b | C.F.a | Row Sum |
|---|---|---|---|---|---|---|---|---|---|
# Explanations of that provide... | – | – | 3 | – | – | – | – | – | 3 |
# Explanations of that provide... | 1 | – | 51 | – | – | – | – | – | 52 |
# Explanations of that provide... | – | – | – | – | – | – | 2 | – | 2 |
# Explanations of that provide... | 1 | – | 1 | – | – | – | – | – | 2 |
# Explanations of that provide... | 1 | – | – | – | – | – | – | – | 1 |
# Explanations of that provide... | 4 | – | 4 | – | 1 | – | – | – | 9 |
# Explanations of that provide... | 7 | – | 1 | – | – | – | 1 | – | 9 |
# Explanations of that provide... | – | – | – | – | – | – | – | – | – |
# Explanations of that provide... | 3 | 13 | – | – | – | – | – | 1 | 17 |
# Explanations of that provide... | 9 | – | – | – | – | – | – | – | 9 |
# Explanations of that provide... | – | – | – | – | – | – | – | – | – |
# Explanations of that provide... | – | – | – | – | – | – | – | – | – |
Sum Across Types: | 26 | 13 | 60 | – | 1 | – | 3 | 1 | 104 (57%) |
# Explanations of that provide... | – | – | – | 4 | – | 4 | – | – | 8 |
# Explanations of that provide... | 4 | – | 3 | 1 | – | 3 | – | – | 11 |
# Explanations of that provide... | 20 | – | 1 | 3 | – | – | – | – | 24 |
# Explanations of that provide... | 1 | – | – | 2 | – | – | – | – | 3 |
Sum Across Types: | 25 | – | 4 | 10 | – | 7 | – | – | 16 (25%) |
# Explanations of that provide... | – | – | – | – | – | – | 33 | 1 | 34 (18%) |
Types Sum: | 51 | 13 | 64 | 10 | 1 | 7 | 36 | 2 | 184 |
These explanation type properties are listed in the row and column headers respectively, and may be used to identify the number of explanations identified that provide a given utility to a user
For example, the literature review identified 51 explanations that explained the importance of input dimensions . The total number of explanations for are summed per row, and are summed per u across , , and the total set of information types
Please note the scope of u is defined experimentally, and fully delineating its extent remains an open challenge
Counterfactual (C.F.) of : Inspectors may comprehend instances from that notably change F(x) outputs, such as a classification score
Example (E.G.) of : inspectors are given an element from as an example, such as predictions from some F(x). inspectors comprehend or predict this element because they are given this element, acting as a kind of identify function
Importance (Imp.) of : Inspectors may comprehend the rank or relative importance of elements in . For example, additive index values rank the importance of inputs in predicting outputs from F(x)
Justification (Just.) of : Inspectors may comprehend what causal variables produce elements in , helping them justify F(x) outputs. For example, a linear model’s outputs are comprehensible because inspectors can justify model outputs based on the meaning and influence of model inputs
Logical Clarity of (Logic): Inspectors may comprehend logical interpretations of elements from . If represents a function set, inspectors may comprehend what logical operations are being performed by a function. If is an integer, inspectors may comprehend numbers as truth values
Reduction (Red.) of : Inspectors may comprehend more elements in because the number of elements is reduced, like when reducing the number of or latent space parameters
Representation (Rep.) of : Inspectors may comprehend interpretable corollaries to elements in , like when inputs are mapped to points in the latent space to reveal the meaning of latent variables
Surrogate (Surr.) of : Inspectors may comprehend surrogate representations of elements in , like a given F(x). Surrogates often describe complex models behaviors as interpretable models or rule sets
Table 6 counts how many types from review explained certain types of information, and how this information is useful to model inspectors. Notably, the vocabulary of information subsets from Table 2 can also be used to describe the information explained by . Of the 184 explanations identified, 104 (or 57%) of them explained a model’s , 46 (or 25%) of them explained function components from , and 34 (or 18%) of them explained models by deriving secondary W (Sect. 4.1.3). The bottom row in Table 6 reveals what information types are most commonly explained. The most common either explained the importance of input dimensions () or revealed the representation of a given (). While examples of secondary information () are third-most common, these explanations are often used as inputs to subsequent explanations related to information importance. These examples show that diverse explanation types can be expressed in common notation, where Tables 4 and 7 demonstrate that the types of information being explained can be further decomposed.
Table 4 reveals what function components from Fig. 8 are frequently explained. Of the 46 explanation types helping inspectors comprehend function components, 24 focus on explaining , ten consider , and only eight explanations of entire function systems were identified. But how are these explanation types useful to model inspectors? Cross referencing Table 6 shows that most explanations types revel the representation (e.g. via semantic labels or correlated inputs) or importance (e.g. via uniqueness or completion scores) of functions components.
Table 7 specifies types of commonly explained by existing methods. Here, 49 explanation types help inspectors comprehend full sets of input information (Fu et al. 2020; Sturmfels et al. 2020), output instances are explained 16 times (Koh et al. 2020; Wickramanayake et al. 2021), and individual output dimensions are explained 9 time (Chen et al. 2019; Zhang et al. 2017). While these explanations focus on explaining either full sets or individual elements from an information set , several methods explain the importance of concept clusters in the latent space (Kim et al. 2018) or explain limited sets of model parameters in the latent space (Wang et al. 2021; Singh and Yow 2021a).
In summary, these results demonstrate how Eq. 1 can be used to summarize the diverse set of information explained by XAI methods in a succinct, tabular manner. Likewise, the collection of methods for producing explanations can also be evaluated.
Table 7. Summary of the specific types of that were explained by the types of identified from the literature review
# Instances of observed explanation types | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Information type | Information set | Sum | ||||||||||||
Name | ID | |||||||||||||
Instance | 1 | 3 | 2 | 2 | 2 | 1 | 2 | 5 | – | 16 | 9 | – | – | 42 |
Perturbed instance | 2 | – | – | – | – | – | – | – | – | – | – | – | – | – |
Instance comparison | 3 | – | – | – | – | – | – | – | – | – | – | – | – | – |
Inclusion set | 4 | – | 1 | – | – | – | 2 | 1 | – | – | – | – | – | 4 |
Perturbed set | 5 | – | – | – | – | – | – | – | – | – | – | – | – | – |
Clusterset | 6 | – | – | – | – | – | – | 2 | – | – | – | – | – | 2 |
Included range | 7 | – | – | – | – | – | – | – | – | – | – | – | – | – |
Exclusion set | 8 | – | – | – | – | – | – | – | – | – | – | – | – | – |
Full set | 9 | – | 49 | – | – | – | 4 | 1 | – | – | – | – | – | 54 |
Projected set | a | – | – | – | – | – | – | – | – | – | – | – | – | – |
Comparison set | b | – | – | – | – | – | – | – | – | – | – | – | – | – |
Permutations | c | – | – | – | – | – | – | – | – | – | – | – | – | – |
Trajectory | d | – | – | – | – | – | – | – | – | – | – | – | – | – |
Power set | e | – | – | – | – | – | – | – | – | – | – | – | – | – |
Labels | f | – | – | – | – | – | 1 | – | – | 1 | – | – | – | 2 |
Sum: | 3 | 52 | 2 | 2 | 1 | 9 | 9 | – | 17 | 9 | – | – | ||
The information types and IDs correspond to Table 2 and help specify the number of explanations corresponding to each type in
For example, two explanations of cluster sets in subsets of latent space dimensions, , were identified in the white-space analysis
The total number of explanations are summed with respect to the information set type per row, and for the type per column
Formal and primitive methods of explanation: “"
Many XAI methods are used to explain information from a model’s life cycle, though two method types were identified empirically during review: (1) primitive explainability methods, and (2) formal XAI methods. Both method types are examples of in the Qi-Framework. Primitive methods are basic approaches to presenting or deriving explainable information, which includes approaches for presenting basic figures. For example, input examples may be associated with they activate (Chen et al. 2020), or compared to understand distinctions in latent space clusters (Ghorbani et al. 2019). While publications sometimes include these approaches in passing, they represent distinct approaches to explaining model behavior. Eight were identified via the literature review, which are described beneath Table 8. Future research could identify a more complete set of such primitives. Formal XAI methods detail more extensive algorithms for creating explanations, such as Layerwise Relevance Propogation (LRP), MAPLE, and LIVE (Bach et al. 2015; Plumb et al. 2018; Staniak and Biecek 2018).
Our analyses identified 42 formal explanation methods and 142 primitive explanation methods. Often, formal methods derive secondary information types (introduced in Sect. 4.1.3) that are associated with familiar information types to aid comprehension, as in saliency methods produced by GradCAM (Fu et al. 2020) and Guided Integrated Gradients (Kapishnikov et al. 2021). Most explanations were produced by associating interpretable information types with non-interpretable ones. This observation is reasonable as inspectors often learn new information by starting with something they know. For example, researchers often associate models’ latent space predictions and with example inputs or reconstructed samples that are human-interpretable. The second-most common primitive is composition, which is regularly employed when designing model architectures. For example, interpretable models are often composed of explainable sub-units. Other primitive XAI method are less common but include optimization, which can be used to produce counterfactuals of model inputs (Wachter et al. 2017), and reduction, which can be used to compress the number of representations in the latent space (Rymarczyk et al. 2020). While many more primitive and formal XAI methods may be envisioned (Sect. 5), the value of the Qi-Framework is seen in expressing the diversity of these existing methods in a single variable to aid comprehension and discussion across XAI research.
Table 8. Summary of explanation methods () identified in the literature review that produce explanations of various information types
Explanation Method | Associationa | Comparisonb | Compositionc | Optimizationd | Reductione | Reconstructionf | Similarityg | Trainingh | Formal | Row Sum |
|---|---|---|---|---|---|---|---|---|---|---|
# of Explanations Produced Via... | 2 | 1 | – | – | – | – | – | – | – | 3 |
# of Explanations Produced Via... | 35 | 1 | – | – | – | – | – | – | 16 | 52 |
# of Explanations Produced Via... | – | – | – | 1 | – | – | – | – | 1 | 2 |
# of Explanations Produced Via... | 2 | – | – | – | – | – | – | – | – | 2 |
# of Explanations Produced Via... | 1 | – | – | – | – | – | – | – | – | 1 |
# of Explanations Produced Via... | 5 | 2 | – | – | 1 | – | – | 1 | – | 9 |
# of Explanations Produced Via... | 6 | – | – | – | – | 2 | – | 1 | – | 9 |
# of Explanations Produced Via... | – | – | – | – | – | – | – | – | – | – |
# of Explanations Produced Via... | 17 | – | – | – | – | – | – | – | – | 17 |
# of Explanations Produced Via... | 9 | – | – | – | – | – | – | – | – | 9 |
# of Explanations Produced Via... | – | – | – | – | – | – | – | – | – | – |
# of Explanations Produced Via... | – | – | – | – | – | – | – | – | – | – |
Sum Across Types: | 77 | 4 | – | 1 | 1 | 2 | – | 2 | 17 | 104 (57%) |
# of Explanations Produced Via... | 2 | – | 2 | – | 2 | – | – | – | 2 | 8 |
# of Explanations Produced Via... | 7 | 1 | 1 | – | 2 | – | – | – | – | 11 |
# of Explanations Produced Via... | 20 | – | 3 | – | – | – | – | 1 | – | 24 |
# of Explanations Produced Via... | 1 | – | 2 | – | – | – | – | – | – | 3 |
Sum Across Types: | 30 | 1 | 8 | – | 4 | – | – | 1 | 2 | 46 (25%) |
# of Explanations Produced Via... | 1 | – | – | – | – | – | 9 | 1 | 23 | 34 (18%) |
Total Sum: | 108 | 5 | 8 | 1 | 5 | 2 | 9 | 3 | 42 | 184 |
The column headers represent XAI method types (either formal or primitive)
The table values represent the number of times an XAI method type was used to produce explanations of the information types listed as row headers
For example, 52 explanation types explained input dimensions () via an association with alternate information type(s)
The total number of explanations for are summed per row, and are summed per method type across , , and the total set of information types
Association: Association presents two or more elements from different information types, and , often assuming inspectors can distinguish corollary patterns
Comparison: Comparison presents two or more elements of the same information type, , often assuming that inspectors can distinguish underlying differences
Composition: Composition aggregates multiple element information types into explanations built as larger sets of information
Optimization: Optimization methods are used to generate representative elements of some information type, , which are then presented as examples or used in subsequent explanations
Reduction: Reduction decreases the number of elements in an information set, , to aid an inspectors’ global comprehension of
Reconstruction: Reconstruction uses a generator head to produce an example of (often the input space) using an element from an alternate information type, . Examples from are often shown to inspectors to represent patterns captured in the latent space
Similarity: Similarity captures the likeness of two elements in an information set , which is often presented as a quantifiable similarity metric like the Euclidean distance, cosine similarity, etc
Training: Training simply representing the process of training a machine learning model, which is included as a primitive as some XAI methods produce models using different loss functions or datasets to evaluate changes in model behaviors and representations
Discussion: generation and numerical comparison
While the prior sections demonstrate one means where the Qi-Framework helps researchers and practitioners summarize and describe diverse explanation types, there is additional value in generating new XAI methods (Sect. 5.1) and in building paths toward quantitative comparisons across explanation types (Sect. 5.2–Sect. 5.6).
Generating novel XAI methods using the QI-framework
In addition to showing common trends in XAI, the tables in Sect. 4 act as tools for discovering novel XAI methods. For example, researchers may identify unique combinations of unused information from Table 3 as a starting place for inquisition: (1) Few methods produce explanations by comparing data trajectories in Z to provide human-interpretable insights. (2) Few of the found methods perturb model inputs and form explanations by comparing permutations of model outputs. Similar gaps are observed in Table 6: (3) Relatively few of the methods explored provide counterfactual explanations. (4) No methods in this review attempted to extract logical interpretations embedded into data clusters in X, Z, or Y. While the process of creating individual methods is left to the reader, these four examples depict how the Qi-Framework and subsequent analysis reveal deficits in the kinds of information and the utility of explanations that are available in current literature. Not every blank space in the tables from Sect. 4 represent meaningful research directions, though they give researchers a starting place when considering new methods.8 In this way, the Qi-Framework presents a common lens for exploring limitations in the field of XAI which extends to the pursuit of quantitative method comparisons.
Introduction to quantitative comparisons
Quantitative comparisons have remained an outstanding challenge in XAI (Burkart and Huber 2021), though the abstracted syntax in the Qi-Framework lets us explore quantitative methods more generally. Existing quantitative comparisons may be lifted from prior works, and often relate to the following categories:
Evaluating user comprehension: Some criteria evaluate how individual explanations aid user comprehension, as in work by Ribeiro et al. (2018) when comparing LIME against the ANCHORS method. This comparison performs a user study to evaluate how different explanation types help users predict the same behavior of a classifier. Both human and simulated users are considered.
Properties of individual explanations: Quantitative studies sometimes evaluate the properties of individual methods, as when Miró-Nicolau et al. (2024) compare diverse saliency maps using similarity metrics and Earth Mover’s Distance. Similarly, work by Plumb et al. (2018) provides metrics to evaluate both causal-local explanations and global explanations.
Performance characteristics of XAI methods: In some scenarios, studies consider the relative time complexity of different models, which provides practical comparisons for practitioners Li et al. (2023).
Quantifying the effect explanations have on inspectors
To quantify the effect of an explanation, model inspectors are first assumed to be a function I(x) that can predict information about a model, F(x), provided the model input, . A given I(x) may predict many kinds of model information, including model outputs, relevant inputs, latent space representations, or active and decision traces. However, inspectors may fail to predict model behaviors, thereby revealing shortcomings in inspectors’ comprehension, predictive ability, and resulting uncertainty. While Ribeiro et al. (2018) quantify inspectors comprehension, this section illustrates the concept using Fig. 9 to illustrate and extend basic premises. Figure 9a shows a model F(x) trained on a dataset (), where the inspector is depicted as a second function. Here, I(x) predicts some model trends better than others, representing how the predictive error () depends on x. Many metrics may be used for depending on the type of data and inspectors, though Fig. 9b shows distributions of the norm computed between I(x) and F(x). Assuming normality, an inspector’s prediction error may be summarized as the expected value (denoted using ) of an inspector’s prediction error and related uncertainty, :
14
15
16
While Fig. 9b defines as the norm between I(x) and F(x) to reveal an inspector’s understanding of the model, may also be calculated between I(x) and information from to characterize inspectors’ comprehension of a dataset.[See PDF for image]
Fig. 9
Overview of an explanation’s effect on an inspector, where a before receiving an explanation, an inspector may be thought of as having (b) uncertainty about model’s behavior. c Explanations may be derived to describe the behavior of a model, which (d) helps an inspector better predict and comprehend a model’s behavior, e which is reflected in a change in their uncertainty. f The impact of an explanation may be quantified as the change in the inspector’s uncertainty, or , and their predictive capability . Here, explanations may have a positive or negative effect on inteprter’s comprehension, but have more nuanced effects like leading toward uncertain comprehension or false certainty
This notation helps formalize the impact of an explanation on inspectors’ comprehension. Provided an explanation method that produces an explanation e, we can evaluate the change in an inspector’s mean predictive ability () and uncertainty () after giving e to an inspector (I(x|e)). Here, and represent an inspector’s comprehension after receiving an explanation.
17
18
Figure 9c illustrates a local explanation of F(x) at , where a linear surrogate is produced which mimics aspects of LIME and MAPLE (Ribeiro et al. 2016; Plumb et al. 2018). An inspector’s comprehension of model behavior may improve when given this explanation (Fig. 9d) leading to reduced predictive error and uncertainty (Fig. 9e). This formalism is given without implying any constraints on the explanation type, explanation format, explanation method, or inspector characteristics. Granted, in practice there are many dependencies between F(x), M, and e that limit what kinds of explanations can be produced for a given F(x), but the approach to evaluating an explanation effect is general. Furthermore, this approach assumes that the explanation effect, or and , results from the given explanation rather than external influences.Framing explanations as affecting inspector’s comprehension provides a natural approach for quantifying an explanation’s effect. The type of explanation effect is determined by the sign and magnitude of and . Figure 9f shows how an explanation can improve or degrade an inspector’s understanding, but may also lead toward false confidence or real but uncertain comprehension. Theoretically, these explanation effects can be measured in human studies (see Sect. 6). Provided these metrics can be measured, it gives a straightforward method for ranking explanations by the effect they have across many inspectors, where larger, positive values indicate improved inspector comprehension.
Formalizing the explanation effect provides one approach to understanding whether a given e has a local effect, global effect, or an effect somewhere in between. Practically, an inspector’s understanding of a model varies along F(x). This basic intuition reveals how and can vary across , inspiring the conditional forms of Eqs. 15 through 22, which are produced below:
19
20
21
22
Figure 10 shows a notional representation of the conditional distributions of (Fig. 10b), where an I(x) comprehends the general trend of a model, but its understanding is offset by some bias (Fig. 10a). The inspector may be given a local explanation (), or a global explanation which may be given as a collection of local explanations (). Depending on how I(x) learns from an explanation to update their internal model, their comprehension of F(x) could improve locally, or globally. For example, provided an , may better comprehend a single point along F(x), as in Fig. 10c, but could alternately learn to correct its bias leading to global comprehension, as in Fig. 10e. Similarly, given an , may depend too much on , leading to overfitting (Fig. 10g), or may appropriately depend on prior knowledge, resulting in global comprehension (Fig. 10i). Note how local and global inspector comprehension is distinct from (though likely related to) measures of model fidelity, which measure whether explanations faithfully represent a parent model’s behaviors (Ribeiro et al. 2016).Determining whether an explanation has local or global effects on comprehension depends on sampling, as illustrated in Fig. 10f, h, j, which show how the conditional explanation effect may appear identical for explanations with different effects depending on where I(x) and F(x) are sampled. These observations represent several scenarios which may be encountered when determining the scope of an explanation. Whereas Plumb et al. (2018) define global and local explanations by comparing an explanation’s fidelity against a parent model, this work instead evaluates an inspector’s comprehension of model behaviors, which provides a more general path to quantitatively compare diverse explanation types.
[See PDF for image]
Fig. 10
Types of effects explanations may have on an inspector. Note that (a) an inspector’s comprehension of a model (b) may change for different inputs to that model. While explanations are commonly called local or global, this figure shows that local explanations may have either a local effect, as in c and d, or a global effect, as in e and f. Similarly, global models can have either a local effect, as in g and h, or a global effect, as in i and j. This effect depends on how inspectors learn from explanations. As shown in h, an explanation’s effect may appear global despite being local, which depends on function sampling
Quantifying the local and global effects of explanations
The Qi-Framework helps quantify whether an explanation produces local or global effects on inspectors, which is formalized in this section. Given a set of information from the machine learning life cycle that is related to a model F(x), and an explanation e provided by that helps an inspector predict given a point x sampled from space , we may formally define the degree to which an explanation’s effect is local or global. However, the Qi-Framework naturally extends to additional types of explanation effects. These effects are regarded as spatial, applying to every point in a space , or transversal, applying across each information type in . Here, we consider that I(x) can realistically predict many kinds of information from , where an inspector focused on predicting information type is written as . While defines the full set of XAI-Relevant information, in practice XAI methods may specify the subset of that is being explained:
Definition 1
A local explanation effect is a measure of how much an explanation (e) impacts an inspector’s comprehension ( and ) about a single kind of information in , for individual points x in the space .
Definition 2
A spatial explanation effect is a measure of how much an explanation (e) impacts an inspector’s comprehension ( and ) about a single kind of information in , for any point x in the space .
Definition 3
A transversal explanation effect is a measure of how much an explanation (e) impacts an inspector’s comprehension ( and ) about all kinds of information in , for individual points x in the space .
Definition 4
A global explanation effect is a measure of how much an explanation (e) impacts an inspector’s comprehension ( and ) about all kinds of information in , for all points x in the space .
In practice, many metrics could formalize these definitions. We refer to this family of metrics as . As one such measure, we define the Local Comprehension Score (LCS) (Eq. 23) as the sum of the scaled change in comprehension. The LCS considers changes in predictive error and uncertainty, which are scaled by the inspector’s initial comprehension to capture the relative change in comprehension and to make these values unitless. This definition of LCS lets us define the Spatial Comprehension Score (SCS), Transversal Comprehension Score (TCS), and Global Comprehension Score (GCS) as follows. Practically, these metrics represent four instances for a general class of explanation metrics.
23
24
25
26
Note that these definitions vary in their scope. Unique TCS and SCS scores would need to be produced for each information type in , while the GCS represents every set of information in and across every point in a given . Note how these metrics may be used to rank the value of individual explanations. This method provides an explanation-agnostic approach to quantifying aspects of explainability, which has been highlighted as a challenge in XAI (Burkart and Huber 2021). This is one example where the Qi-Framework helps define technical definition to well-established but ill-defined terminology prevalent in XAI.The cumulative effect of explanation sequences
Given the Qi-Framework may be applied to quantify the value of single explanations, the approach in Sect. 5.4 naturally extends to quantifying the effect of explanation trajectories. Given a trajectory of explanations , the Cumulative Explanation Effect (CEE) can be computed as the sum of any metric in that evaluates an explanation’s effect on an I(x), which may depend on x and :
27
Note how the CEE provides one approach for ranking inspectors’ learning ability. For example, if several I(x) were given the same explanation sequence , the CEE for each I(x) would reveal the cumulative comprehension that each inspector gained from . Note the CEE depends heavily on the inspector and the given . These observations reveal a connection between the Qi-Framework and tangential fields in machine learning. For example, prompt-based learning characterizes the number of examples needed to teach LLMs (Li et al. 2024). Here, the CEE may characterize an LLM’s learning ability. From another perspective, curriculum learning seeks to identify input sequences that expedite model learning (Zhu et al. 2022). In this field, the CEE may help rank curricula by their relative training value, which may depend on the inspector or learning agent observing an explanation sequence.Summary and extensions to quantative comparisons
This discussion demonstrates how the notation of the Qi-Framework helps researchers consider and define quantitative comparisons of diverse explanations. The proven method of characterizing how well explanations aid user comprehension (Ribeiro et al. 2018) may be extended more generally across explanations that explain the same type of information. Furthermore, while this paper determines explanation types empirically, these metrics could help future works quantify the type of information explanations help users comprehend. If two types of model behavior are described using information types A and B, a model inspector could be given an explanation and asked to predict these information types. If inspectors improve their comprehension of information type A rather than B, the given explanation type may be an explanation of type A. Practically, explanations may be ranked and grouped into explanation types by their ability to help inspectors comprehend multiple information types. This example summarizes the value of the Qi-Framework: defining syntax that describes explanations with respect to a certain information type aids more general application of existing quantitative metrics while considering paths toward quantitative explainability studies based on user comprehension.
Opportunities and future directions
Beyond quantitative comparisons, the Qi-Framework and embedded syntax highlight many opportunities and future directions for XAI research. These directions include extending the Qi-Framework into a more robust science for XAI (Sect. 6.1) and applying this framework to build explanation databases (Sect. 6.2). Using these databases in human and computational explainability studies (Sect. 6.3) could benchmark existing XAI methods to evaluate whether they explain enough information justify a model’s deployment (Sect. 6.4).
Qi-framework extensions: toward rigorous XAI
Several extensions to the Qi-Framework could progress toward a “rigorous science for interpretable machine learning,"which is a concept discussed by Doshi-Velez and Kim (2017). These extensions include formalizing: (1) What information XAI should explain; (2) The understandability and utility of general syntax; (3) The number of ways the same information can be used by model inspectors; (4) The mathematical characteristics of varying I(x) types; (5) The set of XAI method primitives used to produce explanations; and (6) Studies for quantitatively determining the information explained by diverse explanation types. These extensions are discussed below:
Scope of XAI-relevant information: There is a need to formally define the scope of XAI-Relevant information. In other words, what information should XAI explain? For example, this paper focuses on information related to model training, but information relevant to model design, deployment, and monitoring could help aid the development of standardized procedures for V&V. Outlining this information could help determine when “enough"of a model is or can be explained.
Qi-framework interpretability and efficacy: The understandability and usability of the proposed syntax is important to its adoption. Much work in this paper is theoretical and grounded in empirical observation, and more work is needed to establish the receptivity of the Qi-Framework to real-world inspectors. Practically, user studies could study formats wherein the Qi-Framework is comprehensible to both researchers and practitioners. For example, the Qi-Framework could be translated into work-sheets or checklists to evaluate what formats help inspectors describe their explanation needs for certain use cases, and whether inspectors can determine when given sets of XAI methods meet their needs.
Utility of explanation types: Sect. 4.2 describes how explanations that explain a certain information type could help inspectors understand the importance, justification, and representation of elements in an information set. However, this work determines the set of u empirically. While the Qi-Framework remains extensible by letting users define additional instances of u, mathematical formalisms could help bound the type of explanations that can reasonably be pursued in XAI.
Variation in inspector types: The same explanation may be helpful to one model inspector, while leaving other inspectors confused. In other words, certain “inspector families"may prefer different explanation types and formats. Existing works discuss human-centric factors related to XAI (Chromik and Schuessler 2020), but do not define fundamental characteristics of model inspectors. Extending the Qi-Framework to mathematically define certain inspector characteristics help in determining explanations that are best suited for certain inspector types. Such pursuits could leverage concepts from educational psychology, machine learning, and related fields.
Scope of XAI-method primitives: The Qi-Framework describes explanation methods as formal or primitive empirically (see Sect. 4.3), though no mathematical definition of primacy is provided. This approach leaves room for researcher to define new primitives, but does not reveal the number of primitives that may be used to produce explanations. Future research could more formally define the extent of explanation primitives to aid method development and categorization.
Quantifying explanation types: The Qi-Framework notation is abstracted empirically from observations. While this paper represents the authors’ best effort to classify explanation types, empirical processes are prone to bias. To remove this bias, there is value in quantifying the types of information inspectors can better understand after receiving certain explanations. For example, an explanation type may be presented to users in explainability studies. If an inspector can better predict information Type-A after receiving a certain explanation type, the explanation may be called an explanation of Type-A.
Explanation databases for studying explanation effects
Few databases of XAI explanations are available in literature, which could aid the development of quantitative comparisons and provide benchmarks across XAI. Existing XAI research often depends on traditional datasets like ImageNet (Deng et al. 2009), MNIST (Cohen et al. 2017), and other text-based datasets for classification or regression. These datasets are often used to train XAI methods or show what kinds of explanations a model can produce, rather than demonstrate the effect and value of individual explanations.
While some papers provide sets of explanations as images (Chen et al. 2020), and others release databases that can be used to evaluate the credibility of explanations (e.g. the CLEVR-XAI dataset by Arras et al. (2022)), there is a need to develop explanation database of model behaviors. Here, explanation databases are defined as a set of explanations produced by XAI methods and intended to help inspectors comprehend model behaviors. These databases differ from standard datasets which are used to train machine learning models. Explanation databases should include a set of explanations produced by an XAI method, and instances of information that an inspector should be able to predict given the explanations and a set of model inputs . Many explanation types may be included, such as saliency maps, counterfactuals, or textual descriptions. Developing explanation databases for individual models could help determine the relative value of different explanations type in explaining model behaviors. Such studies could be performed in explainability studies using XAI dashboards (Sect. 6.3).
Common XAI dashboards for explainability studies
XAI dashboards are interfaces that produce model explanations and are becoming more popular in open source communities (Dijk et al. 2023; Manca et al. 2023; Spinner et al. 2019), but often neglect to integrate software features to help evaluate the effect that explanations have on model inspectors. Explainability interfaces sometimes quantify these effects using human studies for XAI (Dijk et al. 2023; Manca et al. 2023; Spinner et al. 2019) but are infrequently released and less frequently evaluate the impact of diverse explanation types. Sharing interfaces that quantify the effects of many explanation types could help in creating benchmarks for XAI and streamline competition in the field. This perspective observes that accuracy measures aided CNN development in computer vision Russakovsky et al. (2015). However, explainability studies present several challenges.
Quantifying the explanation effect for an I(x) can be challenging. Human comprehension cannot easily be determined across all points in x for a model F(x). Evaluating human comprehension even for a single x proves difficult. While human studies are prevalent in XAI literature (Zhou et al. 2018; Zhang et al. 2018; Fu et al. 2020), they depend on many factors (Doshi-Velez and Kim 2017). Sampling every point in I(x) is infeasible, I(x) may change continuously while observing examples, and there are no guarantees that and result from the given explanation or from other factors like an inspector’s focus and attention. These complications provide barriers limiting the value of evaluating the effect explanations have on human inspectors.
Several approaches could mitigate the challenges above. To solve the sampling issue studies may be limited to important sets or ranges of information that are important for safety or indicative of general model performance. Crowd-sourcing studies provide more samples from I(x), but simultaneously varies the characteristics of each I(x). Parameterized models of I(x) could simplify computation provided grounded assumptions about I(x). To surmount the nondeterministic impact of I(x) on and computational studies may specify I(x) as a machine learning model, such as a LLM. This approach mirrors prompt-based learning approaches (Madotto et al. 2021), but focuses on evaluating the quality of an explanation rather than an LLM’s capabilities. Note that while human inspectors are influenced by the explanations they receive and the impact of future explanations depend on prior explanations, LLMs can be reinitialized to their initial state.
These factors demonstrate the value of defining standard explainability studies for evaluating explanations’ effects on both human and model-based inspectors. Making these studies accessible in open-source XAI dashboards would streamline competition in XAI as a path toward standardized model deployment (Sect. 6.4).
Toward comprehension thresholds for model deployment
As the Qi-Framework provides some grounds for quantitatively comparing diverse explanation types, it may be reasonable to quantify how much of a model inspector’s can comprehend. This assumption is used to present a notional use case in determining whether models are understood well enough to be deployed, which provides value as a thought experiment for high-stakes decisions. As an example paralleling A-Basis testing (Nettles 2004), healthcare applications could gauge whether doctors understand how their model-based tools are working, perhaps requiring (notionally) that 95% of practitioners using a model comprehend 90% of specified model behaviors. In contrast, deploying generative models for art generation likely requires no user comprehension (0%). This example questions whether explainability thresholds could act as a helpful criteria for model deployment, which provides several challenges. Firstly, this approach assumes that model comprehension can be expressed as a percentage of an information space, which may be feasible if inspectors define the types of i that need to be understood. For example, inspectors could decompose a model into a set of XAI-Relevant information, or , prepare an explanation database with relevant information (Sect. 6.2), and evaluate whether a set of inspectors comprehend model behaviors. Provided an adequate number of inspectors comprehend a sufficient level of model behaviors, we could describe a model as being sufficiently well explained. This approach depends on the application space, the explainability thresholds required for individual models, the model size, and the space of information that needs to be comprehended. Secondly, it is unclear how to define “sufficient levels of comprehension."Thirdly, many different kinds of inspectors could be tasked with evaluating model behaviors, which likely depends on the expected application and impact of a model. While explainability thresholds seem insufficient for model deployment (i.e. model accuracy and confidence are important, as in work by Petkovic (2023)), this challenge seeks to inspire discussion regarding model deployment from the perspective of inspectors’ comprehension.
Conclusion
This work introduces the Qi-Framework which provides a common syntax for expressing, describing, and comparing diverse sets of explainability needs. The resulting notation is more generally applicable, mathematically grounded, and relevant to future explainability approaches, which makes the Qi-Framework useful for researchers and practitioners looking to discover novel research directions for XAI and communicating what model behaviors need to be better explained. While prior taxonomies, frameworks, and literature reviews compare XAI methods and explanations using static terminology that does not provide quantitative means for general method comparisons, this work starts from existing mathematical notation for explanations and abstracts a general syntax (Eq. 1) for describing diverse explanation types that helps researchers apply existing quantitative methods more generally. Here, the Qi-Framework describes explanation types in terms of: (1) The information explained by an explanation; (2) How an explanation is useful to model inspectors; (3) The method used to produce an explanation; and (4) The information needed to produce an explanation. This syntax helps define a path toward quantitatively evaluating an explanations’ effects, scope, and shortcomings as compares to other methods. The potential impact is in helping researchers describe, compare, and discover novel works using a general syntax which simultaneously helps practitioners describe their explainability needs and identify relevant methods. The value of the work is supported by describing 184 diverse explanations in the syntax of the Qi-Framework, presenting tabular analysis to help discover gaps in existing work, and help in showing a path towards generally applying quantitative comparative methods in XAI. This work also explores several research directions. First, the type of explanations is determined empirically in this work, but could be determined quantitatively. Secondly, rich explanation databases and open-source dashboards for human studies could help benchmark and rank the explainability of existing approaches. Thirdly, developing standards and thresholds describing sufficient levels of explainability could provide goals for XAI that direct research. In aggregate, the Qi-Framework inspires future progress in XAI while immediately helping researchers and practitioners find and compare XAI methods that are relevant across finance, defence, and healthcare where interpretability is crucial. As stones may extend a longer roadway, so too this work furthers paths toward a rigorous science of explainability where human comprehension directs the development of AI systems rather than a dependence on incomprehensible systems.
Acknowledgements
Special thanks to Daniel Capecci at the Florida Institute for National Security for the insightful question: “What makes an explanation sufficient?"
Author contributions
Stephen Wormald: Conceptualization, data curation, formal analysis, investigation, methodology, visualization, validation, writing-original draft, writing-review and editing. Matheus Maldaner: Data curation, writing-review and editing. Kristian O’Connor: Data curation, writing-review and editing. Olivia Dizon-Paradis: writing-review and editing. Damon Woodard: Supervision and review.
Funding
The author(s) received no specific funding for this work.
Declarations
Conflict of interest
The authors declare no Conflict of interest.
Informed consent
As no individual participants were involved in the study, no informed consent was required.
Research involving human and animal participants
This article does not contain any studies with human participants or animals performed by any of the authors.
Note these terms and criteria are qualitative in nature, and represent the authors’ conscientious effort to fairly interpret and represent both the current and prior works.
2Note there is some ambiguity in selecting function components that are responsible for producing an explanation. For example, an XAI method may input a full neural network, but may give special attention to a single network node when producing an explanation (Simonyan et al. 2019). In these scenarios, the “smallest responsible function component"was listed as the function subcomponent used to produce an explanation. For example, while the ProtoPNet learns a variety of prototypical patterns, the smallest responsible function component is the “prototype"or “concept"vector (Chen et al. 2019).
3Practically, “question mapping"could be performed by a domain expert, a program or explanation interface, or by a machine learning model like a Large Language Model (LLM) or symbolic program (Mao et al. 2019). A formal definition of user comprehension is provided in Sect. 5.
4Note how explanations may either be unavailable or infeasible to produce, as illustrated beneath the question mapping in Fig. 6d.
5For example, saliency maps may be used to rank inputs by their relative importance, but can also be used to show the representation of certain function elements. Please see Sect. 4.2 for further discussion regarding an explanation’s utility.
6Examples include the additive index score (Staniak and Biecek 2018), detector uniqueness (Zhou et al. 2018), and TCAV scores (Ghorbani et al. 2019). Numerous other examples can be found in Fig. 7.
7Note that information regarding model training, or , is excluded and reference is instead made to Fig. 7
8Note that readers may recognize methods that address the research gaps identified in this analysis. In such cases, the syntax provided using the Qi-Framework will have proven effective as a descriptive language for existing XAI methods.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
Adadi, A; Berrada, M. Peeking inside the black-box: a survey on explainable artificial intelligence (xai). IEEE Access; 2018; 6, pp. 52138-52160. [DOI: https://dx.doi.org/10.1109/ACCESS.2018.2870052]
Altmann, A; Toloşi, L; Sander, O et al. Permutation importance: a corrected feature importance measure. Bioinformatics; 2010; 26,
Arras, L; Osman, A; Samek, W. Clevr-xai: a benchmark dataset for the ground truth evaluation of neural network explanations. Inform Fusion; 2022; 81, pp. 14-40. [DOI: https://dx.doi.org/10.1016/j.inffus.2021.11.008]
Arya V, Bellamy RK, Chen PY, et al (2019) One explanation does not fit all: a toolkit and taxonomy of ai explainability techniques. Preprint at arXiv:1909.03012
Bach, S; Binder, A; Montavon, G et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE; 2015; 10,
Badreddine, S; Garcez, A; Serafini, L et al. Logic tensor networks. Artif Intell; 2022; 303, 4353218 [DOI: https://dx.doi.org/10.1016/j.artint.2021.103649] 103649.
Barnett AJ, Schwartz FR, Tao C, et al (2021) Iaia-bl: a case-based interpretable deep learning model for classification of mass lesions in digital mammography. Preprint at arXiv:2103.12308
Belle, V; Papantonis, I. Principles and practice of explainable machine learning. Front Big Data; 2021; 39, [DOI: https://dx.doi.org/10.3389/fdata.2021.688969] 688969.
Bommer, PL; Kretschmer, M; Hedström, A et al. Finding the right xai method-a guide for the evaluation and ranking of explainable ai methods in climate science. Artif Intell Earth Syst; 2024; 3,
Burkart, N; Huber, MF. A survey on the explainability of supervised machine learning. J Artif Intell Res; 2021; 70, pp. 245-317.4224661 [DOI: https://dx.doi.org/10.1613/jair.1.12228]
Carmichael Z (2024) Explainable ai for high-stakes decision-making. PhD thesis, University of Notre Dame
Chattopadhay A, Sarkar A, Howlader P, et al (2018) Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp 839–847
Chau, SL; Hu, R; Gonzalez, J et al. Rkhs-shap: shapley values for kernel methods. Adv Neural Inf Process Syst; 2022; 35, pp. 13050-13063.
Chen J, Song L, Wainwright M, et al (2018) Learning to explain: an information-theoretic perspective on model interpretation. In: International Conference on Machine Learning, PMLR, pp 883–892
Chen C, Li O, Tao D, et al (2019) This looks like that: deep learning for interpretable image recognition. Adv Neural Inform Process Syst, 32
Chen, Z; Bei, Y; Rudin, C. Concept whitening for interpretable image recognition. Nat Mach Intell; 2020; 2,
Chen L, Qiu Y, Zhao J et al (2021) Cpkd: Concepts-prober-guided knowledge distillation for fine-grained cnn explanation. 2021 2nd International Conference on Electronics. IEEE, Communications and Information Technology (CECIT), pp 421–426
Chromik M, Schuessler M (2020) A taxonomy for human subject evaluation of black-box explanations in xai. Exss-atec@ iui 1
Cohen G, Afshar S, Tapson J, et al (2017) Emnist: Extending mnist to handwritten letters. In: 2017 International Joint Conference on neural networks (IJCNN), IEEE, pp 2921–2926
Covert, I; Lundberg, SM; Lee, SI. Understanding global feature contributions with additive importance measures. Adv Neural Inf Process Syst; 2020; 33, pp. 17212-17223.
Das S, Agarwal N, Venugopal D, et al (2020) Taxonomy and survey of interpretable machine learning method. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), IEEE, pp 670–677
Datta A, Sen S, Zick Y (2016) Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In: 2016 IEEE symposium on security and privacy (SP), IEEE, pp 598–617
Deng J, Dong W, Socher R, et al (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, Ieee, pp 248–255
Dijk O, oegesam, Bell R, et al (2023) oegedijk/explainerdashboard: v0.4.5: drop numpy[CDATA[<]]1.25 restriction. Zenodo, https://doi.org/10.5281/ZENODO.6407091
Dong H, Mao J, Lin T, et al (2019) Neural logic machines. Preprint at arXiv:1904.11694
Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. Peprint at arXiv:1702.08608
Dwivedi, R; Dave, D; Naik, H et al. Explainable ai (xai): core ideas, techniques, and solutions. ACM Comput Surv; 2023; 55,
Emamirad E, Omran PG, Haller A, et al (2023) A system’s approach taxonomy for user-centred xai: a survey. Preprint at arXiv:2303.02810
Fu R, Hu Q, Dong X, et al (2020) Axiom-based grad-cam: towards accurate visualization and explanation of cnns. Preprint at arXiv:2008.02312
Ghojogh B, Ghodsi A, Karray F, et al (2022) Spectral, probabilistic, and deep metric learning: tutorial and survey. Preprint at arXiv:2201.09267
Ghorbani A, Zou J (2019) Data shapley: Equitable valuation of data for machine learning. In: International Conference on Machine Learning, PMLR, pp 2242–2251
Ghorbani A, Wexler J, Zou JY, et al (2019) Towards automatic concept-based explanations. Adv Neural Inform Process Syst. 32
Hanif A, Zhang X, Wood S (2021) A survey on explainable artificial intelligence techniques and challenges. In: 2021 IEEE 25th International Enterprise Distributed Object Computing Workshop (EDOCW), IEEE, pp 81–89
Ibrahim, R; Shafiq, MO. Explainable convolutional neural networks: a taxonomy, review, and future directions. ACM Comput Surv; 2023; 55,
Jacovi A, Swayamdipta S, Ravfogel S, et al (2021) Contrastive explanations for model interpretability. Preprint at arXiv:2103.01378
Jiang, PT; Zhang, CB; Hou, Q et al. Layercam: exploring hierarchical class activation maps for localization. IEEE Trans Image Process; 2021; 30, pp. 5875-5888. [DOI: https://dx.doi.org/10.1109/TIP.2021.3089943]
Kapishnikov A, Bolukbasi T, Viégas F, et al (2019) Xrai: better attributions through regions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 4948–4957
Kapishnikov A, Venugopalan S, Avci B, et al (2021) Guided integrated gradients: an adaptive path method for removing noise. in 2021 ieee. In: CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5048–5056
Khan, A; Sohail, A; Zahoora, U et al. A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev; 2020; 53, pp. 5455-5516. [DOI: https://dx.doi.org/10.1007/s10462-020-09825-6]
Kim B, Wattenberg M, Gilmer J, et al (2018) Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In: International Conference on Machine Learning, PMLR, pp 2668–2677
Kim E, Kim S, Seo M, et al (2021) Xprotonet: diagnosis in chest radiography with global and local explanations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 15719–15728
Kingma, DP; Welling, M et al. An introduction to variational autoencoders. Found Trends ® Mach Learn; 2019; 12,
Koh PW, Nguyen T, Tang YS, et al (2020) Concept bottleneck models. In: International Conference on Machine Learning, PMLR, pp 5338–5348
Li O, Liu H, Chen C, et al (2018) Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. In: Proceedings of the AAAI Conference on Artificial Intelligence
Li D, Liu Y, Huang J, et al (2023) A trustworthy view on xai method evaluation. Authorea Preprints
Li Z, Fan S, Gu Y, et al (2024) Flexkbqa: a flexible llm-powered framework for few-shot knowledge base question answering. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 18608–18616
Loh, HW; Ooi, CP; Seoni, S et al. Application of explainable artificial intelligence for healthcare: a systematic review of the last decade (2011–2022). Comput Methods Programs Biomed; 2022; 226, [DOI: https://dx.doi.org/10.1016/j.cmpb.2022.107161] 107161.
Losch, M; Fritz, M; Schiele, B. Semantic bottlenecks: quantifying and improving inspectability of deep representations. Int J Comput Vision; 2021; 129, pp. 3136-3153. [DOI: https://dx.doi.org/10.1007/s11263-021-01498-0]
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Adv Neural Inform Process Systems. 30
Lundberg SM, Erion GG, Lee SI (2018) Consistent individualized feature attribution for tree ensembles. Preprint at arXiv:1802.03888
Madotto A, Lin Z, Winata GI, et al (2021) Few-shot bot: Prompt-based learning for dialogue systems. Preprint at arXiv:2110.08118
Manca G, Bhattacharya N, Maczey S, et al (2023) Xaiprocesslens: a counterfactual-based dashboard for explainable ai in process industries. In: HHAI, pp 401–403
Mao J, Gan C, Kohli P, et al (2019) The neuro-symbolic concept learner: interpreting scenes, words, and sentences from natural supervision. Preprint at arXiv:1904.12584
Martins T, De Almeida AM, Cardoso E, et al (2023) Explainable artificial intelligence (xai): a systematic literature review on taxonomies and applications in finance. IEEE Access
Minh D, Wang HX, Li YF, et al (2022) Explainable artificial intelligence: a comprehensive review. Artif Intell Rev. pp 1–66
Miró-Nicolau M, Jaume-i Capó A, Moyà-Alcover G (2024) Assessing fidelity in xai post-hoc techniques: a comparative study with ground truth explanations datasets. Artif Intell. 104179
Montavon, G; Lapuschkin, S; Binder, A et al. Explaining nonlinear classification decisions with deep taylor decomposition. Pattern Recogn; 2017; 65, pp. 211-222. [DOI: https://dx.doi.org/10.1016/j.patcog.2016.11.008]
Morris, MD. Factorial sampling plans for preliminary computational experiments. Technometrics; 1991; 33,
Muhammad MB, Yeasin M (2020) Eigen-cam: Class activation map using principal components. In: 2020 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–7
Nauta M, Van Bree R, Seifert C (2021) Neural prototype trees for interpretable fine-grained image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14933–14943
Nettles AT (2004) Allowables for structural composites. In: International Conference on Composites Engineering, NASA Marshall Space Flight Center, Hilton Head, SC, United States, https://ntrs.nasa.gov/citations/20040111395
Nomm S (2023) Towards the linear algebra based taxonomy of xai explanations. Preprint at arXiv:2301.13138
Perin, R; Telefont, M; Markram, H. Computing the size and number of neuronal clusters in local circuits. Front Neuroanat; 2013; 7, 1. [DOI: https://dx.doi.org/10.3389/fnana.2013.00001]
Petkovic, D. It is not “accuracy vs. explainability”-we need both for trustworthy ai systems. IEEE Trans Technol Soc; 2023; 4,
Petsiuk V, Das A, Saenko K (2018) Rise: Randomized input sampling for explanation of black-box models. Preprint at arXiv:1806.07421
Petsiuk V, Jain R, Manjunatha V, et al (2021) Black-box explanation of object detectors via saliency maps. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11443–11452
Plumb G, Molitor D, Talwalkar AS (2018) Model agnostic supervised local explanations. Adv Neural Inform Process Syst. 31
Ribeiro MT, Singh S, Guestrin C (2016) " Why should i trust you?" Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135–1144
Ribeiro MT, Singh S, Guestrin C (2018) Anchors: high-precision model-agnostic explanations. In: Proceedings of the AAAI Conference on Artificial Intelligence
Riegel R, Gray A, Luus F, et al (2020) Logical neural networks. Preprint at arXiv:2006.13155
Russakovsky, O; Deng, J; Su, H et al. Imagenet large scale visual recognition challenge. Int J Comput Vision; 2015; 115, pp. 211-252.3422482 [DOI: https://dx.doi.org/10.1007/s11263-015-0816-y]
Rymarczyk D, Struski Ł, Tabor J, et al (2020) Protopshare: Prototype sharing for interpretable image classification and similarity discovery. Preprint at arXiv:2011.14340
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. Adv Neural Inform Process syst. 30
Schwalbe G, Finzel B (2023) A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining Knowl Discov. pp 1–59
Selvaraju RR, Cogswell M, Das A, et al (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 618–626
Shrikumar A, Greenside P, Kundaje A (2017) Learning important features through propagating activation differences. In: International Conference on Machine Learning, PMLR, pp 3145–3153
Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks: visualising image classification models and saliency maps. Preprint at arXiv:1312.6034
Simonyan K, Vedaldi A, Zisserman A (2019) Deep inside convolutional networks: visualising image classification models and saliency maps. arxiv 2013. Preprint at arXiv:1312.6034
Singh, G. Think positive: an interpretable neural network for image recognition. Neural Netw; 2022; 151, pp. 178-189. [DOI: https://dx.doi.org/10.1016/j.neunet.2022.03.034]
Singh, G; Yow, KC. An interpretable deep learning model for covid-19 detection with chest x-ray images. IEEE Access; 2021; 9, pp. 85198-85208. [DOI: https://dx.doi.org/10.1109/ACCESS.2021.3087583]
Singh, G; Yow, KC. Object or background: an interpretable deep learning model for covid-19 detection from ct-scan images. Diagnostics; 2021; 11,
Singh, G; Yow, KC. These do not look like those: an interpretable deep learning model for image recognition. IEEE Access; 2021; 9, pp. 41482-41493. [DOI: https://dx.doi.org/10.1109/ACCESS.2021.3064838]
Smilkov D, Thorat N, Kim B, et al (2017) Smoothgrad: removing noise by adding noise. Preprint at arXiv:1706.03825
Speith T (2022) A review of taxonomies of explainable artificial intelligence (xai) methods. In: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp 2239–2250
Spinner, T; Schlegel, U; Schäfer, H et al. explainer: a visual analytics framework for interactive and explainable machine learning. IEEE Trans Visual Comput Graphics; 2019; 26,
Springenberg JT, Dosovitskiy A, Brox T, et al (2014) Striving for simplicity: the all convolutional net. Preprint at arXiv:1412.6806
Staniak M, Biecek P (2018) Explanations of model predictions with live and breakdown packages. Preprint at arXiv:1804.01955
Štrumbelj, E; Kononenko, I. Explaining prediction models and individual predictions with feature contributions. Knowl Inf Syst; 2014; 41, pp. 647-665. [DOI: https://dx.doi.org/10.1007/s10115-013-0679-x]
Sturmfels, P; Lundberg, S; Lee, SI. Visualizing the impact of feature attribution baselines. Distill; 2020; 5,
Sundararajan M, Taly A, Yan Q (2017) Axiomatic attribution for deep networks. In: International Conference on Machine Learning, PMLR, pp 3319–3328
Szepannek G, Lübke K (2023) How much do we see? On the explainability of partial dependence plots for credit risk scoring. Argumenta Oeconomica. 1(50)
Wachter, S; Mittelstadt, B; Russell, C. Counterfactual explanations without opening the black box: automated decisions and the gdpr. Harv JL & Tech; 2017; 31, 841.
Wang H, Wang Z, Du M, et al (2020) Score-cam: Score-weighted visual explanations for convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 24–25
Wang J, Liu H, Wang X, et al (2021) Interpretable image recognition by constructing transparent embedding space. 2021 IEEE. In: CVF International Conference on Computer Vision (ICCV), pp 875–884
Wickramanayake S, Hsu W, Lee ML (2021) Comprehensible convolutional neural networks via guided concept learning. In: 2021 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–8
Yeh, CK; Kim, B; Arik, S et al. On completeness-aware concept-based explanations in deep neural networks. Adv Neural Inf Process Syst; 2020; 33, pp. 20554-20565.
Yuan, H; Yu, H; Gui, S et al. Explainability in graph neural networks: a taxonomic survey. IEEE Trans Pattern Anal Mach Intell; 2022; 45,
Zhang Q, Cao R, Wu YN, et al (2017) Growing interpretable part graphs on convnets via multi-shot learning. In: Proceedings of the AAAI Conference on Artificial Intelligence
Zhang Q, Cao R, Shi F, et al (2018) Interpreting cnn knowledge via an explanatory graph. In: Proceedings of the AAAI Conference on Artificial Intelligence
Zhang Q, Yang Y, Ma H, et al (2019) Interpreting cnns via decision trees. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6261–6270
Zhou B, Khosla A, Lapedriza A, et al (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2921–2929
Zhou, B; Bau, D; Oliva, A et al. Interpreting deep visual representations via network dissection. IEEE Trans Pattern Anal Mach Intell; 2018; 41,
Zhu Y, Nie JY, Su Y, et al (2022) From easy to hard: a dual curriculum learning framework for context-aware document ranking. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pp 2784–2794
Copyright Springer Nature B.V. Aug 2025