Content area
Malware continues to pose a critical threat to computing systems, with modern techniques often bypassing traditional signature-based defenses. Ensemble-boosting classifiers, including GBC, XGBoost, AdaBoost, LightGBM, and CatBoost, have shown strong predictive performance for malware detection, yet their “black-box” nature limits transparency, interpretability, and trust, all of which are essential for deployment in high-stakes cybersecurity environments. This paper proposes a unified explainable AI (XAI) framework to address these challenges by improving the interpretability, fairness, transparency, and efficiency of ensemble-boosting models in malware and intrusion detection tasks. The framework integrates SHAP for global feature importance and complex interaction analysis; LIME for local, instance-level explanations; and DALEX for fairness auditing across sensitive attributes, ensuring that predictions remain both equitable and meaningful across diverse user populations. We rigorously evaluate the framework on a large-scale, balanced dataset derived from Microsoft Windows Defender telemetry, covering various types of malware. Experimental results demonstrate that the unified XAI approach not only achieves high malware detection accuracy but also uncovers complex feature interactions, such as the combined effects of system configuration and security states. To establish generalization, we further validate the framework on the CICIDS-2017 intrusion detection dataset, where it successfully adapts to different network threat patterns, highlighting its robustness across distinct cybersecurity domains. Comparative experiments against state-of-the-art XAI tools, including AnchorTabular (rule-based explanations) and Fairlearn (fairness-focused analysis), reveal that the proposed framework consistently delivers deeper insights into model behavior, achieves better fairness metrics, and reduces explanation overhead. By combining global and local interpretability, fairness assurance, and computational optimizations, this unified XAI framework offers a scalable, human-understandable, and trustworthy solution for deploying ensemble-boosting models in real-world malware detection and intrusion prevention systems.
Introduction
Malware attacks, which involve malicious software designed to compromise, disrupt, or damage computer systems, are an ever-evolving threat in cybersecurity [1, 2, 3, 4–5]. As computing systems become more integral to various sectors, the sophistication and frequency of these attacks have increased, posing significant risks to the integrity, confidentiality, and availability of systems and data. The increasing complexity of malware demands advanced detection systems capable not only of identifying but also effectively mitigating these threats [6, 7–8]. Ensemble-boosting classifiers, including widely used models such as GBC Classifier (GBC), eXtreme GBC (XGBoost), Adaptive Boosting (AdaBoost), Light GBC Machine (LightGBM), and Categorical Boosting (CatBoost), have emerged as powerful tools against malware due to their high accuracy and robustness [9]. These classifiers aggregate predictions from multiple base models, which is particularly advantageous when dealing with noisy or incomplete data [10]. Their capacity to manage large and complex datasets makes them well suited for the intricate task of malware detection, which is critical to safeguarding the reliability and security of computer systems [11].
However, despite their strong predictive performance, ensemble-boosting classifiers are often criticized for their complex and opaque nature, which limits transparency, interpretability, and fairness [12]. In the cybersecurity domain, understanding model decisions is essential for building trust, debugging systems, and adhering to regulatory standards. Without interpretability, it becomes difficult to ensure effective threat response and the development of resilient, secure systems.
To address these challenges, explainability methods, such as SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME), have been developed. SHAP provides detailed breakdowns of each feature’s contribution to a prediction, offering deep insights into the model’s decision-making process [13]. LIME approximates complex models with simpler, interpretable models to provide local explanations [14]. Additionally, Descriptive mAchine Learning EXplanations (DALEX) offers a comprehensive suite of tools to visualize model behavior [15]. Integrating these explainability techniques into the analysis of ensemble-boosting classifiers significantly enhances our understanding of how these models make predictions for malware detection. For example, SHAP values can reveal the most influential features in detecting specific types of malware, guiding targeted improvements in feature engineering and model refinement. LIME can identify and help correct local prediction anomalies, ensuring consistent model performance across diverse scenarios. DALEX further enables thorough evaluations of model stability and feature interactions, which are critical for developing robust and transparent malware detection systems [16, 17]. Moreover, ensuring fairness in model predictions is essential to avoid biased outcomes that could undermine the effectiveness of malware detection systems [18, 19]. Fairness analysis plays a crucial role in identifying and mitigating potential biases that may cause uneven performance across malware types, user groups [20, 21], or machine environments. By rigorously assessing and ensuring fairness, we can prevent scenarios where certain threats are disproportionately misclassified, leaving critical systems vulnerable.
The strength of the proposed approach lies in its innovative integration of cutting-edge XAI techniques, i.e., SHAP, LIME, and DALEX, to comprehensively address both interpretability and fairness in ensemble-boosting classifiers for malware detection. While traditional XAI methods have primarily focused on interpretability, this research extends their application by systematically incorporating fairness analysis, an underexplored but critical dimension in machine learning-based cybersecurity solutions. This dual focus represents a significant advancement, addressing the black-box nature of ensemble classifiers and ensuring equitable operation across diverse scenarios, such as detecting various malware types, serving different user groups, or adapting to distinct computing environments.
This work also includes a comparison of the proposed unified XAI framework with well-established XAI approaches, namely AnchorTabular and Fairlearn. AnchorTabular is a rule-based method that offers intuitive, interpretable explanations [22], while Fairlearn is an XAI tool designed to evaluate and mitigate fairness issues in machine learning models [23]. Our comparison reveals the complementary strengths of these methods. SHAPASH, a core component of our framework, excels at providing detailed, fine-grained feature contribution analyses, whereas AnchorTabular offers rule-based explanations suitable for real-time decision-making. Similarly, our framework’s fairness evaluation, through DALEX, provides a more comprehensive understanding of fairness trade-offs compared to Fairlearn’s metric-driven approach.
A key novelty of this work lies in the explicit use of the game-theoretic foundation of SHAP to conduct a formal, rigorous analysis of feature attribution and fairness in ensemble-boosting models. While prior research has used SHAP mainly for interpretability, we leverage its cooperative game theory basis to systematically ensure that feature contributions are fairly and consistently measured, aligning model behavior with formal fairness principles. This integration of game-theoretic insights with explainable artificial intelligence for malware detection represents an important and underexplored advancement not currently available in the literature.This paper makes the following major contributions:
A unified XAI framework is introduced, integrating SHAP, LIME, and DALEX to combine global interpretability, localized explanations, and fairness analysis.
A comparative analysis of XAI techniques is conducted, demonstrating their effectiveness in improving the transparency, interpretability, and fairness of ensemble models for malware detection at both local and global levels, based on Microsoft’s endpoint protection solution against various types of malware [24].
A game-theoretic analysis of the Shapley values in ensemble-boosting models is provided. This theoretical foundation enables a rigorous understanding of feature attribution and fairness, ensuring that the contributions of each feature are fairly and consistently measured within the context of malware detection models.
The proposed unified XAI framework is compared with established approaches, such as AnchorTabular and Fairlearn, highlighting its superior interpretability and fairness evaluation capabilities.
Evaluation of the proposed framework is extended to a different dataset (CICIDS-2017 IDS dataset), demonstrating the generalization of the unified XAI framework [25]. SHAP, LIME, and DALEX-based fairness evaluations are applied in intrusion detection scenarios, showcasing how the framework effectively generalizes beyond malware prediction to address broader cybersecurity challenges.
Computational concerns of SHAP in real-time applications are addressed by introducing the optimized TreeSHAP algorithm. This variant significantly reduces computational overhead, especially for high-dimensional datasets, making it suitable for real-time scenarios.
Dataset and Threat Model
The dataset for this study is sourced from Microsoft Windows Defender telemetry, encompassing approximately 9 million records across 72 key attributes (selected from 167). It includes categorical, numerical, and binary features that describe system configurations, capacities, and security states, such as OS version, processor type, disk space, memory, and firewall and antivirus status, providing a robust foundation for modeling and detecting malware patterns and anomalies [24]. The target variable, ‘HasDetections,’ indicates whether malware was detected on a system, which is essential for model training and evaluation. With 51.2% of instances showing malware detection and 48.8% with no detection, the dataset is nearly balanced, ensuring unbiased model predictions. The data are split into 85% for training and 15% for testing, providing a solid framework for performance evaluation. However, the dataset’s complexities require careful consideration. The threat model focuses on detecting malware across various devices in the Microsoft ecosystem, primarily targeting Windows-based systems. The malware types considered include viruses, worms, ransomware, spyware, adware, and other malicious software that threaten system integrity and data security. The dataset reflects a range of system configurations that may be vulnerable to such threats. A key scenario involves a Windows system where a user unknowingly installs malware, bypassing traditional security measures. For example, the ‘IsProtected’ attribute indicates whether antivirus software is active, while the ‘Firewall’ attribute shows the status of the firewall. Systems lacking active firewalls or antivirus protection are more vulnerable, highlighting the need for models that can predict at-risk configurations based on these features. The dataset’s attributes are specifically designed to address multiple attack vectors. For instance, outdated OS patches, captured by the ‘OsVer’ and ‘OsBuild’ attributes, may expose systems to known exploits. Malware can also target vulnerabilities in widely used applications, with attributes like ‘ProductName’ and ‘EngineVersion’ offering insights into the security of these applications. By analyzing these features, predictive models can identify correlations between specific system configurations and increased malware infection risks. A practical threat scenario involves an employee connecting a personal device with outdated OS versions (‘OsVer’, ‘OsBuild’) and a disabled firewall (‘Firewall’) to a corporate network. Such a device could serve as an entry point for malware, which could then spread to other systems with similar vulnerabilities. Predictive models built using this dataset help identify these at-risk systems, prompting interventions such as applying patches or enhancing security measures, thereby mitigating the risk of malware propagation across enterprise networks.
Ensemble-Boosting Classifiers
This section explains the ensemble-boosting classifiers selected for explanation and fairness analysis [26, 27].
Gradient Boosting Classifier (GBC): builds ensemble sequentially, where each new model attempts to correct the errors of its predecessor. It operates by minimizing the difference between target and predicted values [26].
XGBoost: improves GBC by introducing additional features, such as regularization to control ensemble complexity and improve generalization. It builds decision trees based on pseudo-residuals, which are computed as the difference between target and predicted values [28].
AdaBoost: sequentially trains weak learners, typically decision stumps, and adjusts their weights based on performance. Each weak learner’s performance is assessed and used to update the distribution of weights for training the next model [29].
LightGBM: improves GBC by keeping instances with more significant gradients. It ranks training instances by absolute gradient values, with instances having larger gradients being prioritized. The combined subset of high-gradient and resampled low-gradient instances develop a decision tree, which updates the model [30].
CatBoost: is designed to handle categorical features efficiently and uses a permutation-based approach to improve the robustness of the model. The algorithm’s permutation-based training helps in handling categorical data effectively [31].
GBC shows comparable results to XGBoost but lags in key metrics, with an accuracy of 0.648 and an F1 Score of 0.616, likely due to less sophisticated optimization. CatBoost performs competitively, achieving an accuracy of 0.656, precision of 0.666, and ROC AUC of 0.717, highlighting its strength in handling categorical features and mitigating overfitting. AdaBoost ranks lowest across most metrics, with an accuracy of 0.642 and precision of 0.632. Despite its higher recall (0.641), its sequential training approach limits its overall effectiveness.
Table 1. Performance for ensemble-boosting classifiers
Metric | GB | LightGBM | XGBoost | AdaBoost | CatBoost |
|---|---|---|---|---|---|
Accuracy | 0.648 | 0.662 | 0.648 | 0.642 | 0.656 |
Precision | 0.655 | 0.677 | 0.663 | 0.632 | 0.666 |
Recall | 0.574 | 0.597 | 0.579 | 0.641 | 0.603 |
F1 score | 0.616 | 0.635 | 0.618 | 0.636 | 0.633 |
ROC AUC | 0.708 | 0.724 | 0.707 | 0.691 | 0.717 |
PR AUC | 0.713 | 0.727 | 0.712 | 0.684 | 0.721 |
Proposed Unified XAI Framework for Malware Detection
The proposed unified XAI framework aims to address critical challenges in malware detection by integrating state-of-the-art techniques that enhance the transparency and fairness of machine learning approaches. As illustrated in Fig. 1, this multi-tier framework combines ensemble-boosting classifiers with advanced explainability methods, including SHAP, LIME, and DALEX. These methods are systematically organized into three layers—local explanation, global explanation, and fairness and bias analysis—enhancing the interpretability and reliability of malware prediction models. Results and insights derived from each layer provide a comprehensive approach to explainability. Specifically, SHAP enables global interpretation, LIME focuses on local explanations, and DALEX conducts fairness and bias analysis, thereby ensuring the robustness of these models against various malware attacks that threaten computer systems.
Malware represents a significant and continually evolving threat to modern computer systems, driven by a highly organized and well-funded industry intent on evading traditional security measures. Once malware infiltrates a system, it can lead to devastating consequences, including the theft of sensitive data, disruption of critical services, and severe financial losses for both consumers and enterprises. The rapid evolution of malware techniques, such as obfuscation and polymorphism, further complicates the detection process, necessitating predictive models capable of adapting to sophisticated and dynamic threats.
The proposed XAI framework addresses these challenges by integrating the predictive power of ensemble-boosting classifiers with explainability methods that offer actionable insights into the detection process. Local explanation methods, such as LIME, empower security analysts to understand why a particular sample is flagged as malicious, enabling targeted and effective responses. Global explanations through SHAP provide a comprehensive overview of model behavior, identifying key features that influence predictions and uncovering systemic patterns exploited by malware. DALEX’s fairness and bias analysis ensures that the detection models operate equitably across diverse datasets, mitigating the risk of biased decision-making that could undermine security or disadvantage specific user groups.
By characterizing malware behaviors and their underlying features through this explainable framework, security professionals can gain deeper insights into the attack vectors and tactics employed by adversaries. This enhanced understanding not only improves the accuracy and reliability of detection models but also fortifies cybersecurity defenses. The integration of explainable AI into predictive analytics ensures that the proposed framework achieves high detection accuracy while maintaining essential principles of trust, accountability, and fairness. This approach addresses both the technical and ethical dimensions of malware analysis, offering a robust and holistic solution to the challenges posed by advanced malware threats.
The organization of the proposed framework into three explainable layers—local explanation, global explanation, and fairness and bias analysis—is further detailed below:
Global Explanation Layer:
Global Interpretability:SHAP is used to quantify the contributions of individual features to the model’s predictions across the entire dataset. This provides a comprehensive understanding of feature importance and interaction effects, revealing the global factors that influence malware classification. Visualizations in this layer, such as SHAP summary plots, highlight global feature importance, interactions, and dependencies between features throughout the model.
Local Explanation Layer:
Instance-level Explanations:LIME is employed to provide local, instance-specific explanations, focusing on how the feature values of a specific instance contribute to its prediction. LIME generates interpretable surrogate models that approximate the complex ensemble model locally, offering clear insights into the decision-making process for individual instances. Visualizations within this layer display these local contributions, allowing stakeholders to understand the factors influencing predictions for each instance.
Fairness and Bias Analysis Layer:
Bias and Fairness Assessment:DALEX is utilized to evaluate the fairness and potential biases in the model’s predictions. It identifies any performance discrepancies, such as misclassifying specific types of malware or disproportionate errors across user groups. This analysis ensures that the model’s behavior is fair and equitable across all scenarios. Visualizations in this layer, such as fairness plots, highlight any disparities in model behavior and assist stakeholders in identifying areas for further improvement.
[See PDF for image]
Fig. 1
Proposed unified XAI framework for malware detection. The framework integrates SHAP for global interpretation, LIME for local explanations, and DALEX for fairness and bias analysis
Global Explanation Layer
SHAP (Shapley Additive exPlanations) is a method from cooperative game theory that enhances the interpretability of machine learning models. It quantifies the contribution of each feature to a model’s prediction for a specific instance by evaluating the impact of including or excluding the feature across all possible permutations.
For a permutation P of feature indices where n is the total number of features, represents the indices of features before the p-th feature in P. The SHAP value is computed by assessing how the feature’s presence or absence influences the prediction. Mathematically, the SHAP value for feature p is given by Eq. 1
1
where represents the contribution of the p-th feature considering the permutation P of the other features. This ensures that SHAP values comprehensively reflect the impact of each feature on the model’s prediction as given in Algorithm 1.First, the algorithm initializes with the number of iterations Z, the instance of interest x, the feature index j, the ensemble machine learning classifier f, and the data matrix X (
In each iteration, a random instance m is sampled from the dataset X (
The algorithm constructs two versions of the instance x: one including the feature j and one excluding it (
The marginal contribution of feature j is calculated as the difference in model predictions for instances with and without the feature (
[See PDF for image]
Algorithm 1
Algorithm for SHAP
Theorem 4.1
Let n be the number of features, Z be the number of sampling iterations, and be the time complexity of evaluating the predictive model f on a single input instance. Then, the approximate SHAP algorithm for computing the Shapley value of feature j on instance x has an overall time complexity of
Proof
In each of the Z iterations, the approximate SHAP algorithm performs three main operations: (i) sampling an instance from the dataset, which takes O(1) time; (ii) generating a random permutation and constructing modified instances, which takes O(n) time; and (iii) evaluating the predictive model twice, which takes time. Therefore, each iteration costs and over Z iterations, the total time complexity is
For ensemble models, is logarithmic or linear in n. Compared to exact Shapley value computation, which requires time, the approximate algorithm achieves exponential scalability in n. Hence, the claimed asymptotic bound is true.
While the general SHAP algorithm offers strong interpretability by computing feature attributions, its approximate computation can still incur substantial time complexity, especially on high-dimensional data, as formalized in Theorem 4.1. To overcome this limitation specifically for tree-based ensemble models, such as GBC, LightGBM, and XGBoost, the TreeSHAP algorithm can be adopted as an optimized variant of SHAP [32]. TreeSHAP leverages the recursive structure of decision trees to avoid exhaustive enumeration over all feature coalitions, dramatically reducing the computational burden. This efficiency makes TreeSHAP particularly suitable for near real-time applications where rapid interpretability is required. We formalize this improvement in Theorem 4.2, which establishes the optimized time complexity bound.
Theorem 4.2
Let n be the number of features, T the number of trees in the ensemble, and D the maximum depth of any tree. Then, the TreeSHAP algorithm computes the exact SHAP values for all features on a single instance in timeindependent of the number of features n.
Proof
TreeSHAP computes the exact marginal contributions by efficiently traversing the tree paths and aggregating feature attributions.
Traversing each tree takes time, because the number of unique feature paths is bounded by D. Computing SHAP values for all features only requires a single pass through the ensemble, avoiding repeated sampling or permutation over n. Therefore, for T trees, the total time isThis represents a significant improvement over the general case or the approximate case particularly when n is large but the tree depth D is modest, as is typically the case in boosted trees.
Local Explanation Layer
LIME generates interpretable models that approximate the behavior of complex models locally around a specific prediction. For a data point x, LIME aims to find a simpler model that mimics the complex model m in the vicinity of x.
Given a model and p(x) as the probability of x belonging to a class, LIME identifies an interpretable model that aligns with p(x) locally. Locality is defined using an adjacency measure which quantifies how close another point y is to x. The accuracy of the interpretable model m in approximating the complex model p is evaluated with a function measuring how well m captures p’s behavior in the local region. LIME also prefers less complex models, denoted as for better interpretability.
LIME generates an explanation E(x) for a model m by minimizing the sum of inaccuracy and complexity over all models in M. The process is summarized in Algorithm 2. It starts by defining the adjacency measure to establish locality around x and then calculates the inaccuracy for m. The algorithm selects the model m that minimizes the combined inaccuracy and complexity to produce the final explanation E(x).
[See PDF for image]
Algorithm 2
Algorithm for LIME
Fairness and Bias Analysis Layer
DALEX is an XAI method to evaluate machine learning models for interpretability and fairness. It facilitates the comparison between important and unprivileged metrics to detect biases that could compromise the fairness of the model. This work specifically addresses fairness in models, particularly concerning the machine’s location, identified by CountryIdentifier to relate with malware. Ensuring fairness is crucial, because biases can lead to inaccurate predictions and inconsistencies during scaling and deployment. The fairness of the models is evaluated using metrics, such as the accuracy equality ratio, statistical parity ratio, positive predictive value, total positive rate, and false-positive rate.
The evaluation of residuals is an essential step in assessing the model’s performance, as outlined in Part I of Algorithm 3. For an optimal model, residuals should deviate randomly from zero and remain close to zero. For a continuous dependent variable Z, the residual for the yth observation is defined as the difference between the observed value and the predicted value generated by the model f, as shown in Eq. 2
2
The standardized form of these residuals can be calculated using Eq. 3, where represents the variance of the residuals3
In Part II of Algorithm 3, the Ceteris Paribus profiles illustrate how a model’s predictions change with variations in an explanatory variable. These profiles depict the dependency of the target variable on specific features in the dataset. Mathematically, for a vector of random values unrelated to the dataset, let denote the element at position z in The corresponding Ceteris Paribus profile for model f and the explanatory variable at the zth position is evaluated as shown in Eq. 44
Here, represents the vector generated by altering the value of the element at the zth position in to a scalar s. This step is essential for understanding how the model’s prediction depends on changes in a particular explanatory variable.Partial dependence is another key tool for comparing models, as described in Part III of Algorithm 3. If the Ceteris Paribus profiles of different models align, it suggests that the models are not overfitting and exhibit a similar relationship between variables. Conversely, if the profiles diverge, the model may require further refinement. The partial dependence function, which can be evaluated using the Ceteris Paribus profile, is described by Eq. 5
5
This function averages the effect of the explanatory variable j across all observations. Finally, in Part IV of Algorithm 3, variable importance in DALEX is calculated using a model-agnostic approach, meaning that it does not assume any specific structure for the model. To determine variable importance, suppose s is the set of observations, B is the target variable, and is the matrix of all explanatory variables. If is the vector of observed values for B, then is the set of predictions for from the model f, with The loss function is used to evaluate the model’s performance. The initial loss, is defined as the loss function for the original dataset as given in Eq. 66
To assess the importance of each explanatory variable a matrix is created by permuting the values at the ith position of The model predictions for this altered dataset are then evaluated, and the loss function for the altered dataset is calculated as as in Eq. 77
Finally, the importance of is determined by the difference between the altered and original loss functions, as described in Eq. 88
[See PDF for image]
Algorithm 3
Algorithm for Dalex
Game-Theoretic Analysis of Explainable AI for Ensemble-Boosting Models
Game-Theoretic Analysis of Shapley Values
The Shapley Value, a concept from cooperative game theory, provides a fair and optimal method for distributing a total payoff (in this case, the ensemble model’s prediction) among the features (considered as players) based on their contribution to the overall outcome. In this section, we define SHAP in the context of malware detection and prove its theoretical properties, ensuring a robust understanding of its role in model interpretation.
Problem Formulation
Let represent the set of features in the malware detection model, which include attributes, such as EngineVersion, SmartScreen, AVProductStatesIdentifier, etc. We treat the decision-making process as a cooperative game, where each feature acts as a player contributing to the final prediction.
The malware detection model M is a machine learning classifier trained to predict whether a given Windows machine is infected with malware based on its system and security attributes. Each machine i in the dataset is described by a vector where n represents the number of features in the input vector.
The model’s prediction for machine i is representing the probability that the machine is infected, with The goal is to explain how each feature (for contributes to the prediction using Shapley values.
Shapley Values in Ensemble-Boosting Models
Shapley values provide a fair distribution of a prediction’s value across the features. They were originally defined in cooperative game theory, where the goal was to allocate the total value of a coalition (in this case, the prediction) among the players (features). The Shapley value for feature of machine i, denoted as is given by the following expression:where is the set of all features, S is a subset of features excluding j, and f(S) is the prediction made by the model when only features in S are used.
This formula calculates the average marginal contribution of feature j over all possible subsets of features.
To interpret the factors influencing the prediction, we use the Shapley value, which quantifies the contribution of each feature to the model’s prediction. This is formalized in Theorem 5.1.
Theorem 5.1
Let M be a malware detection model trained on a set of given features The Shapley value for feature of machine i, denoted as provides the expected contribution of to the model’s prediction and can be used to interpret the factors that influence the prediction of whether a machine is infected with malware.
Proof
To prove this, we start by recognizing that malware detection can be viewed as a supervised learning problem, where the model’s task is to predict the binary outcome (representing infected or not infected) based on a set of input features The model M takes these features and outputs a probability which corresponds to the likelihood of infection.
The Shapley value quantifies the contribution of feature to the prediction By computing the Shapley values for all features in the input vector we obtain a set of values where each represents how much feature influenced the model’s decision. This approach is particularly useful in malware detection, as it allows us to identify which aspects of a machine’s configuration (e.g., OS version, firewall settings, and CPU architecture) are most indicative of malware infection.
Since Shapley values are derived from all possible combinations of feature subsets, they provide a fair and comprehensive attribution of the prediction. This ensures that the contributions of individual features are not exaggerated or overlooked due to their interactions with other features.
Fair Attribution of Features in Malware Detection Ensemble Models
Prediction Function
Let be a subset of features. The prediction value f(S) represents the model’s output when only the features in S are considered. The prediction function maps subsets of features S to real-valued outputs, where denotes the power set of N.
Properties of the Shapley Value
The Shapley value satisfies several key axioms that guarantee fairness and consistency in attribution. These axioms are as follows:
Efficiency The sum of the Shapley values for all features must equal the total prediction value. This ensures that all contributions are fully accounted for
Symmetry: If two features contribute equally to the model’s prediction, their Shapley values must be identical
Dummy: If a feature does not contribute to the model’s prediction, its Shapley value is zero
Additivity: For two models and the Shapley value for their combined model is the sum of the individual Shapley values This property allows SHAP to be applied to ensemble models, i.e., GBC, XGBoost, and LightGBM.
Theorem 5.2
Consider a malware detection model that predicts the likelihood of malware infection for a given machine i, based on feature values from the telemetry dataset. Let SHAP be applied to explain the contribution of each feature to the model’s prediction for this machine. Then :
Each feature’s contribution (via Shapley values) is fairly attributed and does not depend on the order in which features are considered, as long as the marginal contributions are measured correctly.
The Shapley value-based attribution of features, such as EngineVersion, AVProductStatesIdentifier, or SmartScreen, is independent of feature correlations, ensuring an interpretable understanding of the model’s decision-making process.
Proof
Let f(S) be the model’s prediction given a subset and denote the marginal contribution of feature j. From the efficiency axiom, we know that the total contributions from all features sum to the model’s predictionThe symmetry axiom ensures that if two features contribute identically to the prediction, they will receive equal attribution. The dummy axiom guarantees that features with no impact on the prediction have a Shapley value of zero. Finally, the additivity axiom ensures that SHAP can be applied to ensemble models for malware detection. The sum of Shapley values from individual models will yield the total attribution for the combined model.
Impact of Encoding Methods and Statistically Significant Features on Ensemble-Boosting Models for Malware Detection
In this benchmark study, we employ a four-step feature selection process to evaluate XAI techniques for assessing feature importance in malware detection. First, features with undesirable distributions (e.g., unique values across rows or excessive missing data) are removed. Statistical significance tests (chi-squared for categorical and F tests for continuous features) are then applied, discarding features with p values above 0.05. Finally, highly correlated features are eliminated, retaining the one with the higher F-test score. The remaining features are ranked by significance and used for model training. To limit the feature set for ensemble-boosting models, we focus on the most statistically significant features, enhancing model efficiency and interpretability. For categorical feature encoding, we evaluated several methods. Ordinal encoding was excluded due to the lack of inherent order in some features (e.g., CityIdentifier), and hashing was discarded for its arbitrary output. One-hot and binary encoding were also rejected due to their potential to increase dimensionality. We selected encoding techniques that preserve the informational value without expanding the feature space: Base Target Encoding (mapping categories to the probability of belonging to the positive class), James–Stein Encoding (shrinking base target estimates to reduce overfitting), and Frequency Encoding (replacing categories with their frequency, suitable for test sets without labels). These methods ensure efficient, interpretable, and overfitting-resistant model training.
Table 2. Top five significant features for each encoding technique
Encoding technique | Top 5 features |
|---|---|
Base target encoding | 1. Census_OEMModelIdentifier |
2. CityIdentifier | |
3. Census_FirmwareVersionIdentifier | |
4. AvSigVersion | |
5. SmartScreen | |
James–Stein encoding | 1. Census_OEMModelIdentifier |
2. CityIdentifier | |
3. Census_FirmwareVersionIdentifier | |
4. AvSigVersion | |
5. SmartScreen | |
Frequency encoding | 1. Census_OEMModelIdentifier |
2. CityIdentifier | |
3. Census_FirmwareVersionIdentifier | |
4. AvSigVersion | |
5. SmartScreen |
Table 2 reveals that Census_OEMModelIdentifier, CityIdentifier, and Census_FirmwareVersionIdentifier are consistently the top 3 most important features across all encoding techniques, highlighting their critical role in malware prediction. These features suggest that device-specific characteristics and regional factors play a significant part in distinguishing between malware and non-malware. AvSigVersion and SmartScreen also emerge as crucial features, indicating that antivirus signature versions and security measures are key in detecting malicious activity.
Model Bias and Fairness Analysis of Ensemble-Boosting Classifiers Using DALEX
Experimental Design
This section presents a rigorous experimental setup to train, tune, explain, and evaluate ensemble classifiers, balancing predictive performance and algorithmic fairness. The dataset, sourced from Windows Defender telemetry, includes diverse machine configurations and malware detection statuses, with key features, such as AVProductStatesIdentifier, EngineVersion, AppVersion, and hardware/software details. Stratified sampling ensured balanced protected attribute distributions across training, validation, and test sets. Hyperparameter optimization was performed via grid search with fivefold cross-validation, focusing on maximizing balanced accuracy while minimizing fairness gaps (e.g., disparate impact and equal opportunity difference), as assessed by DALEX fairness diagnostics. Final model configurations included LightGBM (learning rate 0.05, 100 estimators, depth 10), XGBoost (learning rate 0.05, 100 estimators, depth 6), CatBoost (learning rate 0.03, 500 iterations, depth 6, L2 regularization 3, GPU-accelerated), and AdaBoost (depth-1 trees, 50 estimators, learning rate 1.0). All experiments were conducted under consistent hardware (Intel i9 CPU, 32 GB RAM, NVIDIA RTX GPU for CatBoost) with fixed random seeds for reproducibility. The evaluation focused on both predictive bias and fairness across models, using DALEX to enhance local and global explainability. Residual distribution analysis and fairness metrics evaluated whether prediction errors and performance were equitably distributed across demographic and feature-based subgroups. Explainability tools, including residual plots, ceteris paribus profiles, partial dependence plots, variable importance, and breakdown profiles, provided insights into model decision-making. Fairness was further quantified using metrics, such as Accuracy Equality Ratio (ACC), Statistical Parity Ratio (STP), Total Positive Rate (TPR), and False-Positive Rate (FPR), supported by visualizations like radar plots and stacked parity loss plots to highlight disparities. This comprehensive setup enabled the identification of models exhibiting the highest bias, guiding the selection of the most fair and effective solution for malware detection in real-world applications.
Bias and Fairness Analysis
[See PDF for image]
Fig. 2
Comparison of residual plots for ensemble models
[See PDF for image]
Fig. 3
Ceteris Paribus Profiles: illustrate the impact of key features on model predictions while holding other variables constant. a AVProductStatesIdentifier shows how different product states influence the model’s output. b EngineVersion reveals the effect of engine versions on predictions. c AVProductsInstalled_SmartScreen highlights how the presence of SmartScreen affects outcomes. d AppVersion displays the relationship between app versions and prediction changes
Figure 2 presents the residual distributions for each ensemble model. In an ideal scenario, residuals should exhibit a mean close to zero and be symmetrically centered around this point, reflecting minimal and unbiased prediction errors. Most models show a consistent and gradual drop in residuals, indicative of stable and reliable performance. However, one notable exception is observed in the AdaBoost model, which shows a sharp drop to 0%. This abrupt decline suggests that AdaBoost struggles to capture underlying patterns in the dataset, resulting in suboptimal generalization and reduced predictive accuracy.
The Ceteris Paribus profiles, illustrated in Fig. 3, analyze the effect of individual features on model predictions while keeping other variables constant. This approach provides critical insights into the sensitivity of the models to key features, including AVProductStatesIdentifier, EngineVersion, AppVersion, and AVProductsInstalled_SmartScreen. Across the ensemble models, varying levels of sensitivity are observed for these features, reflecting their differing capacities to capture feature–target relationships.
AdaBoost exhibits notably low sensitivity across all analyzed features, which is correlated with its lower overall accuracy. This decreased responsiveness indicates that AdaBoost fails to effectively capture the interactions between these features and the target variable. In contrast, the other models demonstrate clear sensitivity trends, particularly for the two most influential features, AVProductStatesIdentifier and EngineVersion. For these features, the prediction values generally decrease as the feature values increase, indicating that changes in these variables play a substantial role in influencing predictions.
For the remaining features, AppVersion and AVProductsInstalled_SmartScreen, a similar trend of decreasing prediction values with increasing feature values is observed. This pattern suggests the relevance of these variables in the modeling process and is consistent with their high feature importance scores observed in subsequent analyses. These observations illustrate differences in how ensemble models respond to key features, offering insights into their predictive mechanisms and their ability to capture feature–target relationships.
[See PDF for image]
Fig. 4
Partial dependence plots for ensemble-boosting models
The PD profiles depicted in Fig. 4 illustrate how the expected output of a model varies with respect to specific explanatory variables. These profiles provide insights into the response patterns of the ensemble models to changes in key features.
For most of the models analyzed, GBC, LightGBM, XGBoost, and CatBoost, a consistent pattern is observed in their PD profiles. This alignment suggests that these models capture the relationships between the explanatory variables and the target variable, indicating their ability to generalize effectively to the feature space. Their profiles reveal a clear and interpretable relationship, showing sensitivity to changes in feature values that influence predictions.
In contrast, AdaBoost exhibits a different PD profile. Its output remains largely static regardless of changes in feature values, which suggests that it may not adequately capture variables that influence the predictions in other models. This insensitivity to feature changes could limit AdaBoost’s ability to capture complex feature–target interactions, which may contribute to its comparatively weaker performance.
In particular, the four high-performing models show distinct behaviors at the boundaries of feature ranges, with a pronounced shift observed near the average value for AppVersion. This shift indicates their capacity to adapt predictions based on feature values, which is important for maintaining accuracy and robustness. AdaBoost, however, does not exhibit such adaptive behavior. Its static response suggests a critical limitation in utilizing important feature variations, further supporting its relatively lower efficacy for malware detection tasks.
[See PDF for image]
Fig. 5
Variable importance for ensemble-boosting models
The analysis of variable importance across the five ensemble models reveals key insights into the interplay between hardware and software features in malware detection. Using ten permutations, the relative importance of features was evaluated, with the findings presented in Fig. 5. For GBC, XGBoost, and AdaBoost, the three most critical features—SmartScreen_AVProductsInstalled, AVProductStatesIdentifier, and EngineVersion—are predominantly hardware-related, underscoring the models’ reliance on hardware attributes for predicting malware presence. However, some variability in feature importance is observed for other models: AppVersion supplants EngineVersion in CatBoost, while AVProductsInstalled_SmartScreen replaces AVProductStatesIdentifier in LightGBM. The consistent prominence of SmartScreen_AVProductsInstalled across all models is particularly noteworthy. As the most utilized feature, it serves as a cornerstone for prediction generation. This observation is corroborated by the PD plots in Fig. 4, which highlight its significant influence on model outputs. These results align with Breiman’s conceptual framework [33], which postulates that the importance of a feature is directly tied to the degradation of a model’s predictive capability when its values are permuted. Figure 5 substantiates this principle, showing a substantial decline in model performance when the values of SmartScreen_AVProductsInstalled are permuted. This finding emphasizes the feature’s critical role in capturing malware detection patterns, making it an indispensable input across all evaluated ensemble models. The observed variability in feature importance across models, such as the shift in rankings for AppVersion and AVProductsInstalled_SmartScreen, suggests subtle differences in how these ensemble methods prioritize specific attributes. Such differences also influence the bias and fairness of the models, as variations in features can disproportionately affect predictions for certain subgroups.
[See PDF for image]
Fig. 6
Breakdown profile plot for ensemble-boosting models
The breakdown profile presented in Fig. 6 provides a detailed examination of how individual features influence the mean prediction values across the models. The baseline average prediction value is 0.488, with features contributing either positively (represented by green bars) or negatively (red bars) to this value. A consistent trend is observed for the feature AVProductStatesIdentifier, which reduces the mean prediction value across all models except AdaBoost, with decreases ranging from 0.037 to 0.064. However, in AdaBoost, this feature exerts no measurable influence, leaving its predictions effectively static at the baseline value of 0.5 and resulting in a final prediction of 0.495. This limited responsiveness to a critical feature shows AdaBoost’s inability to fully capture the relationships between explanatory variables and the target, reflecting a key limitation in its modeling capacity. In contrast, the other ensemble models, such as GBC, LightGBM, XGBoost, and CatBoost, demonstrate a more dynamic response to features, with clear patterns where specific attributes significantly shift predictions above the baseline value of 0.5. This adaptability reflects these models’ stronger capacity to incorporate feature-level variations into their predictive behavior. By comparison, AdaBoost’s predictions remain constrained within a narrow 1% margin around the baseline, highlighting its limited sensitivity to influential features and raising concerns about its suitability in contexts like cybersecurity, where predictive adaptability is essential. Further insights emerge when examining residual distributions, as shown in Fig. 2. Most models exhibit a steady decline in residuals, suggesting a more reliable minimization of prediction errors across the dataset. However, AdaBoost’s residuals show a pronounced sharp drop to 0%, indicating difficulties in capturing the underlying data patterns. This inconsistency reflects its limitations in addressing complex relationships inherent in malware detection tasks. Ceteris Paribus profiles, illustrated in Fig. 3, further reinforce AdaBoost’s deficiencies. While the other models demonstrate sensitivity to key features, adapting their predictions dynamically in response to variations, AdaBoost remains largely unresponsive. This lack of feature sensitivity suggests that AdaBoost underutilizes critical attributes, such as AVProductStatesIdentifier, EngineVersion, and SmartScreen_AVProductsInstalled, which can negatively impact its predictive accuracy. In contrast, the PD profiles in Fig. 4 reveal that GBC, LightGBM, XGBoost, and CatBoost capture consistent output behaviors across feature ranges. These models appear to leverage the relationships between features and the target variable, with noticeable shifts in prediction values corresponding to attribute variations. AdaBoost’s PD profiles, however, remain static, further reinforcing concerns about its limited adaptability. The variable importance analysis shown in Fig. 5 corroborates these findings, identifying SmartScreen_AVProductsInstalled, AVProductStatesIdentifier, and EngineVersion as pivotal features across most models. The impact of permuting these variables on model performance aligns with Breiman’s principle of feature importance, where the predictive performance of a model degrades when key features are randomized. While the other models consistently show strong dependencies on these variables, AdaBoost demonstrates reduced sensitivity, further substantiating its limited reliance on critical features.
Overall, the breakdown profiles complement the observations from residual analysis, Ceteris Paribus, and Partial Dependence plots, offering a comprehensive view of AdaBoost’s limitations. Its fixed predictions and insensitivity to important attributes stand in contrast to the more adaptable behavior of GBC, LightGBM, XGBoost, and CatBoost. These findings highlight AdaBoost’s relative shortcomings and highlight the importance of leveraging feature-level insights to achieve fair, accurate, and responsive malware detection models.
[See PDF for image]
Fig. 7
Performance versus fairness
[See PDF for image]
Fig. 8
Stacked parity loss
[See PDF for image]
Fig. 9
Fairness radar plot
[See PDF for image]
Fig. 10
Fairness check
Figure 7 depicts the reversed TPR parity loss relative to model performance. Ideally, the most effective models are located in the top right corner, representing both high accuracy and minimal TPR parity loss. Among the analyzed models, LightGBM achieves the highest accuracy but occupies a less favorable position due to its elevated TPR parity loss. This observation shows a critical trade-off between model accuracy and fairness, emphasizing that high accuracy alone may come at the cost of equitable performance across subgroups. Parity loss serves as a measure of bias, capturing disparities in predictions across subgroups within the dataset. For this analysis, CountryIdentifier is used to evaluate fairness, focusing on how models perform across different countries. This approach helps uncover and address potential geographic biases, ensuring that models provide equitable treatment for diverse subgroups. Figure 8 offers a comparative view of parity loss across models, revealing that AdaBoost suffers from the highest parity loss, signaling significant bias and lower accuracy. In contrast, GBC exhibits the lowest parity loss while maintaining competitive accuracy, demonstrating its better ability to balance fairness with performance.
Figure 9 visualizes parity loss metrics for each model using a radar plot with a polar coordinate system, capped at a maximum range of 0.06. The area covered by each model’s plot indicates the extent of bias: larger areas correspond to higher levels of bias. AdaBoost and LightGBM display the most extensive areas, identifying them as the most biased models in this analysis. Conversely, GBC and CatBoost exhibit smaller plot areas, reflecting better fairness performance relative to their counterparts. The radar plot effectively highlights the varying degrees of bias across models, offering an intuitive way to assess fairness disparities. The Fairness Check plot in Fig. 10 further examines model biases. Bars extending into the red region signal bias, with longer bars indicating greater inequity. While none of the models exceed a bias value of 0.05, AdaBoost stands out as the most biased, reaffirming findings from other metrics. Despite this, the overall levels of bias across models remain low, which is encouraging for practical applications, though improvements are still necessary for certain models.
Fairness analysis reveals distinct patterns and critical differences among the ensemble models. GBC emerges as the most equitable model, with the lowest parity loss and competitive accuracy, indicating its ability to provide balanced and fair predictions. AdaBoost, on the other hand, consistently shows the highest levels of bias across multiple metrics, highlighting its need for substantial refinement to improve fairness. LightGBM, while achieving the highest accuracy, displays elevated TPR parity loss, reinforcing the inherent tension between optimizing for accuracy and ensuring equitable performance. These findings highlight the need for models that balance fairness and accuracy, especially in sensitive domains like cybersecurity. GBC and CatBoost demonstrate strong potential, while AdaBoost and LightGBM require further tuning to address biases.
These fairness considerations are not merely ethical add-ons but have direct implications for security performance. A model that achieves high overall accuracy but exhibits bias across subgroups may leave certain populations underprotected, creating blind spots where malware can evade detection. For example, elevated parity loss linked to geographic features like CountryIdentifier indicates inconsistent detection performance across regions, which can undermine the reliability of a security system in real-world deployment. By contrast, models that balance fairness and accuracy, such as GBC and CatBoost, demonstrate a more robust capacity to generalize across diverse environments, ensuring that malware detection remains consistently effective regardless of subgroup characteristics.
Local Interpretation of Ensemble-Boosting Models Using LIME
[See PDF for image]
Fig. 11
Local interpretations of class 1 by GBC, LightGBM, and XGBoost using LIME
[See PDF for image]
Fig. 12
Local interpretations of class 1 by CatBoost and AdaBoost using LIME
LIME provides a localized understanding of model predictions by approximating complex model behavior around specific instances. By constructing a surrogate model that mirrors the original model’s behavior near a given prediction, LIME assigns positive weights to features supporting the prediction and negative weights to those opposing it, offering clear insights into their respective influences.
Figure 11a illustrates the LIME analysis for class 1 (‘Malware’) using the GBC. Features with positive weights, represented by green bars, support the malware classification by increasing its probability, while red bars denote features with negative weights that oppose the classification, steering the prediction toward a ‘No Malware’ outcome. The observed balance in feature contributions highlights the nuanced decision-making capability of GBC, which effectively leverages key features to differentiate between malware and non-malware instances. In Fig. 11b, the analysis of LightGBM reveals six features opposing the malware classification and five features supporting it. This granular view shows the intricate interactions among features, where opposing and supporting contributions create a dynamic interplay.
Figure 11c shows a scenario in which the number of features supporting and opposing the malware classification is equal, yet the opposing features have higher magnitudes. This stronger influence of opposing features prevents the classification of malware, raising concerns about the model’s consistency and reliability. The analysis of CatBoost, presented in Fig. 12a, demonstrates that the majority of features predominantly oppose malware classification, with only three features supporting it. This imbalance points to a limitation in the model’s sensitivity to malware detection, as it fails to effectively leverage supportive features. Compared to other models, this behavior highlights a significant weakness, reducing the model’s reliability in critical scenarios.
In Fig. 12b, the feature contributions for AdaBoost reveal a near-balance between supporting and opposing features, apart from one dominant feature. While this balance might suggest a cautious approach to prediction, it also indicates a lack of decisiveness, reducing the model’s robustness. AdaBoost’s inability to commit to a clear classification outcome could impair its performance in applications requiring high-confidence malware detection.
The above findings demonstrate the value of LIME in elucidating how hardware, software, and other features contribute to model predictions. Its ability to isolate and interpret feature influences offers actionable insights for model refinement and enhances interpretability. However, the variations in feature contributions across models highlight potential inconsistencies that may hinder their reliability.
Explainability of Ensemble-Boosting Models Using SHAPASH
SHAPASH leverages Shapley values, as discussed in Sect. 3, to evaluate feature contributions to malware predictions. Figure 13a highlights the SHAP values for features influencing the malware classification (class 1). Key contributors, including SmartScreen_AVProductsInstalled, EngineVersion, AVProductStatesIdentifier, and AVProductsInstalled_SmartScreen, dominate the predictions for ‘Malware’ versus ‘No Malware’ (class 0). The remaining features progressively contribute less to the prediction. This alignment with LIME’s explanations shows the consistency and reliability of these features across different models, with SHAPASH enhancing interpretability by offering a comprehensive visual representation of their impact.
The SHAP values for SmartScreen_AVProductsInstalled, shown in Fig. 13b, reveal an increase in its contribution to malware classification when the feature’s value is below 15. Beyond this threshold, its contribution predominantly opposes the classification. This behavior illustrates how specific feature values influence model predictions, highlighting the importance of understanding these transitions. Exploring SHAP interactions further reveals how combinations of variables impact predictions, offering deeper insights into the GBC’s behavior and providing an additional layer of explainability. Figure 13c presents local explanations for malware classification in the test data using SHAP values. Supporting features, depicted in gold, include Census_TotalPhysicalRAM,cwhich emerges as the most influential positive contributor. SHAP uniquely identifies latent contributions—both positive and negative—not captured by other XAI methods. Opposing features are shown in gray, reflecting their role in steering predictions away from malware classification. This detailed local interpretation complements global insights, enhancing the understanding of the gradient-boosting model’s decision-making process.
A comparison of local explanations for predictions X_val[50] and X_val[51], illustrated in Fig. 13d, shows varying impacts of features on prediction outcomes. For X_val[50], a 57% probability of malware classification is supported by marginal positive contributions from features depicted in gold. Conversely, X_val[51], with an 84% probability, exhibits stronger feature contributions, with opposing features represented in gray exerting minimal influence. This comparison reveals the dynamic and context-dependent nature of feature contributions.
The analysis of feature interactions highlights significant insights into model behavior. Figure 13e demonstrates the interaction between SmartScreen_AVProductsInstalled and EngineVersion, where higher values of SmartScreen_AVProductsInstalled interact positively with EngineVersion, while lower values interact negatively. This interplay is pivotal in shaping malware classification outcomes, emphasizing the importance of capturing such relationships to improve model robustness. Similarly, Fig. 13f shows interactions between AppVersion and EngineVersion, with AppVersion exhibiting strong interactions across its value range, whereas EngineVersion’s influence is confined to its higher range. The SHAP interaction values indicate a slight tendency toward negative contributions, further reflecting the nuanced relationships between features. The feature importance analysis for LightGBM aligns closely with that of GBC, as depicted in Fig. 14a. The top features, including SmartScreen_AVProductsInstalled, contribute similarly to malware predictions. Figure 14b specifically examines the SHAP contributions of SmartScreen_AVProductsInstalled, where lower feature values play a significant role in driving the classification, while higher values exhibit diminished impact. This consistency between the models highlights their shared reliance on this critical feature.
Local explanations for LightGBM, illustrated in Fig. 14c, reveal that features supporting malware classification (gold) drive a prediction probability of 66% for X_val[50], as their contributions outweigh those of opposing features (gray). For X_val[51], depicted in Fig. 14d, the local contributions show a similar balance between supporting and opposing features, leading to an 83% probability of malware classification. The comparative analysis emphasizes the role of feature dynamics in influencing prediction outcomes across instances. Finally, Fig. 14e captures SHAP interaction values for OsBuildLab_SmartScreen and SmartScreen_AVProductsInstalled. Lower values of OsBuildLab_SmartScreen interact with SmartScreen_AVProductsInstalled, producing predominantly positive contributions to prediction probabilities. These interactions show the necessity of accounting for feature interdependencies to achieve more accurate and interpretable predictions.
[See PDF for image]
Fig. 13
Shapash: Feature importance (a), local explanations (b), comparison of local explanations (c), feature interaction values (d), and SHAP interaction values for combinations 1 (e) and 2 (f) for the GBC
[See PDF for image]
Fig. 14
Shapash: Feature importance (a), local explanations (b), comparison of local explanations (c), feature interaction values (d), and SHAP interaction values for combinations 1 (e) and 2 (f) for the LightGBM model
[See PDF for image]
Fig. 15
Shapash: Feature importance (a), local explanations (b), comparison of local explanations (c), feature interaction values (d), and SHAP interaction values for combinations 1 (e) and 2 (f) for the XGBoost model
The interactions between AVProductStatesIdentifier and EngineVersion are observed in Fig. 14f, where most of the high values of EngineVersion participate in the interaction with a complete range of feature values for AVProductStatesIdentifier. The interaction is majorly positive, with significant negative interactions between the two features.
[See PDF for image]
Fig. 16
Shapash: Feature importance (a), local explanations (b), comparison of local explanations (c), and SHAP interaction values (d) for the CatBoost model. Each subfigure illustrates different aspects of model interpretability, such as how features influence predictions, the rationale behind individual predictions, and feature interactions that affect model decisions
The feature importance analysis for XGBoost, depicted in Fig. 15a, confirms the consistency of key contributors across previously evaluated models, including GBC and LightGBM. These top features demonstrate comparable contribution magnitudes, with XGBoost showing slightly higher importance than LightGBM. This consistency shows the robustness and reliability of these features for malware prediction across diverse modeling techniques. Figure 15b presents SHAP values for XGBoost, revealing a shift in the dominance of features compared to earlier models. AppVersion emerges as a significant contributor, with its impact varying across its value spectrum. At lower and medium value ranges, AppVersion predominantly reduces prediction probability, whereas its influence becomes marginally positive as values surpass the mean. This variability highlights the nuanced role of AppVersion and suggests areas for further exploration and potential model optimization.
Local explanations provided by Shapash, illustrated in Fig. 15c, offer detailed insights into feature contributions for malware classification. For X_val[50], although a larger number of features support the classification, the opposing features exert greater influence, resulting in a 56% probability of predicting class 1 (Malware). This finding shows the necessity of evaluating both the quantity and magnitude of feature contributions, particularly when models produce borderline predictions. A comparative analysis of local feature contributions for X_val[50] and X_val[51], shown in Fig. 15d, highlights the variability of feature impacts on prediction outcomes. For X_val[50], marginal contributions lead to a prediction probability of 57% for class 1. In contrast, X_val[51] exhibits significantly stronger supporting contributions, resulting in an 85% probability of malware classification. This difference demonstrates the context-dependent nature of feature influence and reinforces the value of instance-specific explanations for understanding model behavior.
Figure 15e examines interactions between AVProductsInstalled_SmartScreen and OsBuildLab_SmartScreen, revealing that only the lower values of OsBuildLab_SmartScreen interact with other feature values. These interactions are generally positive but subtle, suggesting a strong relationship that contributes incrementally to the prediction. Such interactions, while limited in scope, provide important insights into the intricate dependencies between features. Interactions between AppVersion and EngineVersion, depicted in Fig. 15f, reveal that most occur at higher values of EngineVersion across the entire range of AppVersion. The contributions are nearly balanced, with a slight dominance of negative interactions, indicating that specific feature combinations can reduce the likelihood of malware classification. Understanding and adjusting these interactions could enhance model sensitivity and overall predictive performance.
The feature importance analysis for CatBoost, shown in Fig. 16a, reaffirms the dominance of the same subset of features identified in the previous models. Their contribution magnitudes are comparable to those observed in LightGBM, reinforcing the consistency of these features in malware prediction. Figure 16b focuses on the CountryIdentifier feature, which demonstrates consistent influence across its value spectrum. Unlike features with varying contributions, CountryIdentifier exhibits neutrality, with no discernible bias. These findings align with DALEX analyses, confirming its unbiased role in prediction while maintaining relevance as an influential feature. Local explanations of CatBoost predictions, presented in Fig. 16c, reveal significant positive contributions for features supporting malware classification. Hidden positive contributions uncovered by Shapash further enhance understanding of the model’s decision-making process. These contributions result in 67% probability of predicting class 1 (Malware), demonstrating Shapash’s efficacy in visualizing critical feature impacts.
Comparative analysis of X_val[50] and X_val[51], depicted in Fig. 16d, highlights the variability in feature contributions. For X_val[50], marginal contributions result in a 68% probability of class 1, while X_val[5] shows stronger support, leading to a 72% probability. These differences emphasize how minor variations in feature contributions can significantly impact predictions, underscoring the need for instance-specific analysis.
The detailed analysis provided by Shapash verifies that the models capture both hidden positive and negative contributions effectively. These insights enhance understanding of the relationships between features and predictions, contributing to the refinement of ensemble models for malware detection. The analysis highlights the robustness and consistency of key features—such as SmartScreen_AVProductsInstalled, EngineVersion, and AppVersion—across ensemble models like GBC, LightGBM, XGBoost, and CatBoost in malware prediction. While feature importance remains largely stable across models, nuances emerge in their contributions and interactions. AppVersion, for instance, demonstrates varying influence on predictions, shifting from negative to positive across its value spectrum, indicating potential areas for further refinement. Interaction analyses reveal subtle but meaningful dependencies, such as between AVProductsInstalled_SmartScreen and OsBuildLab_SmartScreen, which contribute incrementally to predictions. Additionally, local explanations provided by Shapash demonstrates the variability in feature contributions across individual instances, with models showing differing sensitivities to supporting and opposing features. These insights emphasize the importance of both global and local explainability methods in understanding and improving model behavior, ensuring reliable and equitable malware detection outcomes.
XAI techniques provide interpretative insights that explain the correct and significant features, as confirmed by the statistical feature importance analysis presented in Sect. 6 and summarized in Table 2. Both approaches consistently identify key features, such as Census_OEMModelIdentifier, CityIdentifier, and Census_FirmwareVersionIdentifier, validating their critical role in malware detection. Features like AvSigVersion and SmartScreen, which are highlighted by the XAI results, are also statistically significant, further supporting their relevance.
Comparison of the Unified XAI Framework with XAI Approaches on Explainability and Fairness
This section presents a comparison of the proposed unified XAI framework with AnchorTabular (a rule-based method) [22] and Fairlearn (a fairness-focused XAI tool) [23] to highlight the strengths of our framework over these established alternatives.
Interpretability Comparison with AnchorTabular XAI
In this section, we compare the interpretability of Shapash, a core component of our framework, with AnchorTabular, an advanced rule-based method, for two ensemble tree models: GradientBoosting and LightGBM.
Shapash, utilizing Shapley values, provides a detailed, fine-grained analysis of feature contributions. For GBC, key features like SmartScreen_AVProductsInstalled and EngineVersion dominate predictions (Fig. 13a), with shapely values illustrating how their contributions vary across feature ranges (Fig. 13b). This method highlights non-linear interactions between features, as shown in Fig. 13e and f, offering deep insights into model behavior. A similar pattern emerges for LightGBM, where SmartScreen_AVProductsInstalled again plays a critical role (Fig. 14b), and SHAP values explain how specific feature values influence classification outcomes (Fig. 14c and d).
AnchorTabular, in contrast, provides interpretable, rule-based explanations [22]. For GBC, it generates if–then rules, such as “if AVProductsInstalled_SmartScreen then predict Malware,” offering clear and actionable insights (Fig. 17). This rule-based approach ensures high precision and coverage, making it suitable for quick decision-making in real-time applications. A similar approach is seen with LightGBM (Fig. 18), where AnchorTabular generates easily understandable rules, improving interpretability and facilitating rapid decision-making.
The comparison reveals that Shapash performs better at providing a detailed understanding of feature contributions and interactions, making it invaluable for users seeking in-depth insights into model behavior. Its ability to highlight complex relationships between features (Fig. 13e and f) improves model transparency. On the other hand, AnchorTabular provides intuitive, rule-based explanations that are easier to interpret and directly applicable, ideal for scenarios requiring fast, clear insights.
[See PDF for image]
Fig. 17
Comparison of precision and coverage for unique anchors based on GBC model predictions
[See PDF for image]
Fig. 18
Comparison of precision and coverage for unique anchors based on LightGBM model predictions
Fairness Comparison with Fairlearn XAI
This section presents a comparison of the DALEX module of the proposed unified XAI framework with Fairlearn, an XAI tool designed to assess and mitigate fairness issues in machine learning models. Fairlearn offers metrics and visualizations to evaluate model performance across sensitive features, providing a deep dive into demographic parity, equalized odds, and other fairness criteria [23]. This comparison focuses on the performance of the GBC and LightGBM models, evaluated across sensitive features, such as AVProductStatesIdentifier, EngineVersion, AppVersion, and CountryIdentifier, as highlighted by DALEX.
FairnessRadar and Fairness Check plots based on DALEX provide valuable insights into the trade-off between accuracy and fairness. In the FairnessRadar (Fig. 9), LightGBM occupies a larger area, indicating a higher level of bias. Despite achieving the highest accuracy, LightGBM is positioned less favorably due to its elevated TPR parity loss, signaling a tension between high accuracy and fairness. On the other hand, GBC appears in a better position in the radar plot, occupying a smaller area and demonstrating lower parity loss. This aligns with the notion that GBC strikes a better balance, maintaining competitive accuracy while mitigating fairness disparities.
In contrast, Fairlearn provides a more granular, metric-driven analysis of the same models, as shown in Figs. 19 and 20. These fairness metrics reflect significant performance disparities across sensitive features. For both GBC and LightGBM, the Demographic Parity Difference and Equalized Odds Difference are both reported as 1.0, indicating a high level of disparity in predictions across the features being evaluated. This result suggests that both models exhibit substantial fairness issues, though these disparities are not always reflected in traditional performance metrics alone. Additionally, GBC shows a higher False-Positive Rate (0.3835) and False-Negative Rate (0.3545) compared to LightGBM (0.3616 and 0.3500), suggesting that while LightGBM has a slight advantage in error rates, its higher fairness discrepancies persist when considering sensitive attributes.
The Fairness Check plot (Fig. 10) from DALEX also highlights LightGBM’s higher levels of bias, with bars extending further into the red region compared to GBC, which shows more balanced fairness levels. Although none of the models exceed a bias value of 0.05, LightGBM consistently performs with higher inequity compared to GBC, further corroborating the findings of Fairlearn.
The comparison between DALEX and Fairlearn emphasizes the inherent trade-offs between model accuracy and fairness. Both frameworks reveal that while LightGBM offers better accuracy, it exhibits significant fairness challenges, particularly in terms of TPR parity loss and disparities across sensitive features. In contrast, GBC shows a more equitable balance, with lower levels of bias and more consistent performance across all fairness metrics.
[See PDF for image]
Fig. 19
Fairness metrics for predictions made by the GBC model across sensitive features. The plot highlights performance disparities for features like AVProductStatesIdentifier, EngineVersion, AppVersion, and CountryIdentifier
[See PDF for image]
Fig. 20
Fairness metrics for predictions made by the LightGBM model across sensitive features
Interpretability and Fairness Analysis of Intrusion Detection Systems Based on Unified XAI Framework
This section evaluates the proposed unified XAI framework to interpret the predictions of the GBC and LightGBM on the CICIDS-2017 dataset [25]. The framework evaluates both interpretability and fairness, with a particular focus on fairness analysis based on the Destination Port feature, which represents various network protocols. The CICIDS-2017 dataset, widely used in intrusion detection research, consists of 79 features that describe network traffic, including packet counts, flow duration, and statistical measures. The Destination Port feature indicates the network protocol, while the Label identifies the type of traffic (either benign or an attack).
For interpretability, the problem was initially framed as a multi-class classification task, considering all 12 attack classes, such as BENIGN, DoS Hulk, PortScan, DoS GoldenEye, SSH-Patator, Bot, DoS Slowhttptest, Web Attack-Brute Force, DDoS, FTP-Patator, DoS Slowloris, and Web Attack-XSS. However, for compatibility with DALEX, which primarily supports binary classification, the dataset was converted into a binary classification problem, with BENIGN labeled as 0 and all other attack types labeled as 1.
The Destination Port feature was used as the protected attribute for the fairness analysis using DALEX, allowing an evaluation of potential biases in the model’s predictions based on different protocols, e.g., Web, FTP, SSH. Both the GBC and LightGBM models were assessed using fairness metrics, providing insights into whether the models exhibit bias toward specific protocols.
Figure 21 illustrates the top 15 features ranked by the total mean absolute SHAP values across the predictions of the LightGBM, covering all 12 attack categories. Destination Port emerges as the most important feature overall, consistently ranking high across all attack categories. This indicates the heavy reliance of the model on the port number to differentiate between benign and malicious traffic. Specifically, it plays a key role in attacks, such as DoS Hulk and DDoS, where protocol-level traffic analysis is critical for detection. Similarly, Init_Win_bytes_backward ranks highly, particularly influencing attacks like DoS GoldenEye and Web Attack-Brute Force. This feature captures flow control dynamics and is significant in identifying network anomalies tied to specific attack behaviors. Fwd IAT Min, or forward inter-arrival time, is another crucial feature, especially for attacks like SSH-Patator and Bot. It captures the temporal aspects of network traffic, highlighting irregular intervals between packets that are vital for distinguishing these attacks from normal traffic. While these three features dominate the overall importance ranking, other features like Fwd Header Length and Fwd IAT Max also contribute to identifying specific attack types. Fwd Header Length stands out in detecting DoS GoldenEye, where the attack’s unique packet header characteristics become crucial, while Fwd IAT Max is significant for SSH-Patator, revealing irregular intervals in packet streams used in the attack. This analysis emphasizes how the LightGBM model adapts to various attack types by capturing both low-level packet characteristics and high-level traffic flow behaviors, enabling robust intrusion detection across diverse network attack scenarios.
[See PDF for image]
Fig. 21
Top 15 features ranked by mean absolute SHAP values, illustrating the contribution of key features in the LightGBM model’s prediction for intrusion detection
[See PDF for image]
Fig. 22
Interpretability results of LIME for intrusion detection using the LightGBM model. a Explanation for sample 1, b explanation for sample 0, and c explanation for sample 8
[See PDF for image]
Fig. 23
Fairness analysis results using DALEX for GBC and LightGBM models on the CICIDS-2017 dataset. a Combined fairness plot for LGBM and GBC, b Fairness check for GBC, and c Fairness check for LGBM
The LIME results presented in Fig. 22 showcase local feature contributions for three representative samples from the CICIDS-2017 dataset, with each sample corresponding to predictions made by the LightGBM model across various attack categories. As part of the unified XAI framework, LIME complements the global SHAP analysis by providing localized, interpretable explanations, revealing how specific feature values influence individual classification outcomes.
For example, in sample 0 (associated with a predicted class such as DoS Hulk or PortScan), PSH Flag Count and Destination Port exhibit negative contributions, subtly reducing the probability of the predicted attack class, while Init_Win_bytes_backward and Total Length of Bwd Packets show small but positive influences, supporting the prediction. This points to how even small shifts in window sizes or backward packet characteristics can locally push the model toward an attack decision, underscoring the model’s sensitivity to protocol and flow nuances.
In contrast, sample 8 reveals that features like Fwd PSH Flags and Bwd IAT Max have nearly negligible impacts, suggesting that for this case (potentially a Web Attack or Bot instance), the model relies more on aggregated or less obvious traffic signatures, hinting at the complexity and subtlety of distinguishing advanced or stealthy attacks. Sample 1’s explanation highlights stronger local effects, where Total Fwd Packets and Flow Packets/s push positively toward the attack classification, while ACK Flag Count and Active Min dampen it. This mixed contribution pattern emphasizes that the model’s local decision boundary can be shaped by both burst-related flow measures and control flag behaviors, which are characteristic of fast, aggressive intrusions like DDoS or SSH-Patator.
The DALEX fairness module, integrated into the proposed unified XAI framework, offers critical insights into the ethical and equitable performance of gradient-boosting models. Figure 23 presents fairness radar visualizations and subgroup-specific fairness check metrics for both the LightGBM and GBC models. The fairness radar clearly reveals that LightGBM, while strong in overall predictive accuracy, suffers from significant subgroup imbalances. Its disproportionately high false-negative rates for minority groups, such as FTP and SSH, indicate a bias toward underdetecting attacks in these categories, a potentially critical failure mode in intrusion detection systems.
In contrast, the GB model demonstrates more balanced fairness metrics, showing stability across true-positive rates, false-positive rates, and subgroup accuracies. Notably, both models perform reliably on the majority ‘Web’ group, but LightGBM’s fairness deterioration on less represented categories shows why relying solely on global performance is insufficient. Here, the unified XAI framework’s multi-component design proves essential: by combining DALEX’s fairness evaluation with local and global explanation methods, it creates a holistic understanding of model behavior, enabling practitioners to identify not just why a model makes decisions, but for whom those decisions are fair or biased.
Conclusion
Malware continues to pose a significant and evolving threat to modern computing environments, with increasingly sophisticated obfuscation techniques complicating traditional detection methods. Ensemble-boosting classifiers, such as GBC, XGBoost, LightGBM, and AdaBoost, have demonstrated notable improvements in detection accuracy. However, their black-box nature limits their applicability in sensitive domains like cybersecurity, where model transparency, accountability, and fairness are essential. This study introduces a unified explainable XAI framework that integrates LIME for localized explanations, SHAP for global insights, and DALEX for fairness evaluation, delivering interpretable and ethically aligned malware detection solutions. Beyond malware detection, the framework has been evaluated on intrusion detection tasks using the CICIDS-2017 dataset, demonstrating better generalization across cybersecurity domains. Analyses using SHAP, LIME, and DALEX revealed consistent patterns: while models such as LightGBM deliver high predictive accuracy, they exhibit fairness disparities, particularly for underrepresented or minority subgroups. Comparative evaluations with established XAI tools, such as AnchorTabular and Fairlearn, highlight the complementary strengths of our proposed framework. While AnchorTabular offers high-precision, rule-based explanations, SHAPASH, as part of our framework provides detailed and fine-grained feature attribution. Fairlearn further emphasizes the trade-off between accuracy and fairness, where GBC often strikes a better balance across sensitive features, while LightGBM tends to favor accuracy at the expense of subgroup equity. The experimental findings confirm that the proposed framework not only improves detection accuracy but also enhances understanding of model behavior, revealing the influence of critical features such as SmartScreen_AVProductsInstalled, EngineVersion, and AppVersion across multiple models. Fairness evaluations uncover hidden biases, reinforcing the need for balanced optimization. Moreover, by incorporating game-theoretic Shapley value analysis, the framework strengthens interpretability and supports user trust, making it particularly suitable for deployment in high-stakes environments where transparency, fairness, and accountability are paramount.
Author Contributions
Shagufta Henna conceived the study, conducted the theoretical analysis, designed the algorithm, performed experiments with LIME, AnchorTabular, Fairlearn, and Feature Importance, evaluations on CICIDS-2017 dataset, carried out game-theoretic and complexity analyses, and wrote the main manuscript. Mallikharjuna Rao assisted with the evaluation of AnchorTabular and Fairlearn and evaluation on CICIDS dataset. Lakshya Gourav Moitra contributed to the experimental setup and implementation for LIME and DALEX. Upaka Rathnayake assisted in reviewing and refining the manuscript.
Funding
No funding information is applicable.
Availability of Data and Materials
The dataset used in this study is the Microsoft Malware Prediction dataset and Intrusion Detection dataset, which are publicly available on https://kaggle.com/competitions/microsoft-malware-prediction and https://www.unb.ca/cic/datasets/ids-2017.html. Code is available at https://github.com/NeuralNomad25/XAImalware.
Declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Feng, R; Chen, S; Xie, X; Meng, G; Lin, S-W; Liu, Y. A performance-sensitive malware detection system using deep learning on mobile devices. IEEE Trans. Inf. Forensics Secur.; 2021; 16, pp. 1563-1578. [DOI: https://dx.doi.org/10.1109/TIFS.2020.3025436]
2. Zhou, Y; Cheng, G; Yu, S; Chen, Z; Hu, Y. MTDroid: a moving target defense-based android malware detector against evasion attacks. IEEE Trans. Inf. Forensics Secur.; 2024; 19, pp. 6377-6392. [DOI: https://dx.doi.org/10.1109/TIFS.2024.3414339]
3. Jeon, J; Jeong, B; Baek, S; Jeong, Y-S. Static multi feature-based malware detection using multi SPP-net in smart IoT environments. IEEE Trans. Inf. Forensics Secur.; 2024; 19, pp. 2487-2500. [DOI: https://dx.doi.org/10.1109/TIFS.2024.3350379]
4. Zhang, X et al. Slowing down the aging of learning-based malware detectors with API knowledge. IEEE Trans. Dependable Secure Comput.; 2023; 20,
5. Li, D; Cui, S; Li, Y; Xu, J; Xiao, F; Xu, S. PAD: towards principled adversarial malware detection against evasion attacks. IEEE Trans. Dependable Secure Comput.; 2024; 21,
6. Jia, J., Salem, A., Backes, M.: MemGuard: defending against black-box membership inference attacks via adversarial examples. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS), pp. 259–274. ACM, London (2019)
7. Naderi-Afooshteh, A., Kwon, Y., Nguyen-Tuong, A.: MalMax: multi-aspect execution for automated dynamic web server malware analysis. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS), pp. 1849–1866. ACM, London (2019)
8. Xie, W., Chen, N., Chen, B.: Incorporating malware detection into the flash translation layer. In: 2020 IEEE Symposium on Security and Privacy Poster Session. IEEE, San Francisco (online), CA (2020)
9. Khan, AA; Chaudhari, O; Chandra, R. A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation. Expert Syst. Appl.; 2024; 244, [DOI: https://dx.doi.org/10.1016/j.eswa.2023.122778] 122778.
10. Wang, F; Li, Z; He, F; Wang, R; Yu, W; Nie, F. Feature learning viewpoint of AdaBoost and a new algorithm. IEEE Access; 2019; 7, pp. 149890-149899. [DOI: https://dx.doi.org/10.1109/ACCESS.2019.2947359]
11. Zhai, Y., Hao, Y., Zhang, Z.: Progressive scrutiny: incremental detection of ubi bugs in the Linux kernel. In: Proceedings of the Network and Distributed Systems Security (NDSS) Symposium 2022. Internet Society, Reston, VA (2022). https://doi.org/10.14722/ndss.2022.24380
12. Hou, B., O’Connor, J., Andreas, J., Chang, S., Zhang, Y.: PromptBoosting: black-box text classification with ten forward passes. In: Proceedings of the 40th International Conference on Machine Learning (ICML 2023). PMLR, pp. 13309–13324 (2023)
13. Roshan, K; Zafar, A. Utilizing XAI technique to improve autoencoder based model for computer network anomaly detection with Shapley additive explanation (SHAP). Int. J. Comput. Netw. Commun.; 2021; 13,
14. Houda, ZA; Brik, B; Khoukhi, L. “Why should i trust your ids?": an explainable deep learning framework for intrusion detection systems in internet of things networks. IEEE Commun. Mag.; 2022; 60,
15. Baniecki, H; Kretowicz, W; Piatyzsek, P; Wisniewski, J; Biecek, P. Dalex: responsible machine learning with interactive explainability and fairness in Python. J. Mach. Learn. Res.; 2021; 22,
16. Vimbi, V; Shaffi, N; Mahmud, M. Interpreting artificial intelligence models: a systematic review on the application of lime and SHAP in Alzheimer’s disease detection. Brain Inform.; 2024; 11,
17. Nambiar, A., H, S., S, S.: Model-agnostic explainable artificial intelligence tools for severity prediction and symptom analysis on Indian COVID-19 data. Front. Artif. Intell. 6, 1272506 (2023)
18. González-Sendino, R; Serrano, E; Bajo, J. Mitigating bias in artificial intelligence: fair data generation via causal models for transparent and explainable decision-making. Future Gener. Comput. Syst.; 2024; 155, pp. 384-401. [DOI: https://dx.doi.org/10.1016/j.future.2024.02.023]
19. Gurmessa, DK; Jimma, W. Explainable machine learning for breast cancer diagnosis from mammography and ultrasound images: a systematic review. BMJ Health Care Inform.; 2024; 31,
20. Jui, TD; Rivas, P. Fairness issues, current approaches, and challenges in machine learning models. Int. J. Mach. Learn. Cybern.; 2024; 15, pp. 3095-3125. [DOI: https://dx.doi.org/10.1007/s13042-023-02083-2]
21. Dang, VN; Cascarano, A; Mulder, RH et al. Fairness and bias correction in machine learning for depression prediction across four study populations. Sci. Rep.; 2024; 14, 7848. [DOI: https://dx.doi.org/10.1038/s41598-024-58427-7]
22. Ribeiro, M.T., Singh, S., Guestrin, C.: Anchors: high-precision model-agnostic explanations. In: Proceedings of the AAAI Conference on Artificial Intelligence, AAAI Press, Palo Alto, CA, pp. 1527–1535 (2018).
23. Weerts, H., Dudík, M., Edgar, R., Jalali, A., Lutz, R., Madaio, M.: Fairlearn: assessing and improving fairness of AI systems. J. Mach. Learn. Res. 24, 1–8 (2023). Submitted 3/23; Published 7/23
24. Howard, A., Hope, B., Saltaformaggio, B., Avena, E., Ahmadi, M., Duncan, M., n_30, McCann, R., Cukierski, W.: Microsoft Malware Prediction. Kaggle (2018). https://kaggle.com/competitions/microsoft-malware-prediction
25. Cybersecurity, C.I.: CICIDS 2017 Dataset (2017). https://www.unb.ca/cic/datasets/ids-2017.html. Accessed 04 May 2025
26. Omari, K. Phishing detection using gradient boosting classifier. Procedia Comput. Sci.; 2023; 230, pp. 120-127. [DOI: https://dx.doi.org/10.1016/j.procs.2023.12.067]
27. Ping, G. Detection of power data tampering attack based on gradient boosting decision tree. J. Phys: Conf. Ser.; 2021; 1846,
28. Delgado-Panadero, A; Hernandez-Lorca, B; Garcia, MT; Benitez-Andrades, JA. Implementing local-explainability in gradient boosting trees: feature contribution. Inf. Sci.; 2022; 589, pp. 199-212. [DOI: https://dx.doi.org/10.1016/j.ins.2021.12.111]
29. Hatwell, J; Gaber, MM; Azad, RMA. Ada-WHIPS: explaining AdaBoost classification with applications in the health sciences. BMC Med. Inform. Decis. Mak.; 2020; 20,
30. Okolie, C., Mills, J., Adeleke, A., Smit, J., Maduako, I.: The explainability of gradient-boosted decision trees for Digital Elevation Model (DEM) error prediction. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. XLVIII-M-3-2023, 161–168 (2023)
31. Saied, M; Guirguis, S; Madbouly, MA. A comparative study of using boosting-based machine learning algorithms for IoT network intrusion detection. Int. J. Comput. Intell. Syst.; 2023; 16, 177. [DOI: https://dx.doi.org/10.1007/s44196-023-00355-x]
32. Lundberg, SM; Erion, G; Chen, H; DeGrave, P; Prutkin, J; Nair, B; Katz, R; Himmelfarb, J; Bansal, N; Lee, S. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell.; 2020; 2, pp. 56-67. [DOI: https://dx.doi.org/10.1038/s42256-019-0138-9]
33. Breiman, L. Random forests. Mach. Learn.; 2001; 45,
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.