Optimization of EEG-based wheelchair control: machine learning, feature selection, outlier management, and explainable AI

Abstract

Classifying Electroencephalogram (EEG) signals for wheelchair navigation presents significant challenges due to high dimensionality, noise, outliers, and class imbalances. This study proposes an optimized classification framework that evaluates ten machine learning (ML) models, emphasizing ensemble methods, feature selection (FS), and outlier utilization. The dataset, comprising 2869 samples and 141 features, was processed using Recursive Feature Elimination (RFE) and correlation thresholds (CTs), achieving a peak accuracy of 69% with Extra Trees after FS. Notably, training on outlier-only data yielded even higher accuracy (Extra Trees: 82%), underscoring the value of outliers in enhancing class separability. Receiver Operating Characteristic–Precision Recall (ROC-PR) curve analysis confirmed that Extra Trees achieved a ROC AUC (Area Under Curve) of 0.92 and PR AUC of 0.82 for the best-classified movement command, while other models exhibited lower precision-recall (PR) balance. This approach, complemented by explainability techniques, offers a robust solution for EEG-based wheelchair control systems and paves the way for interpretable brain-computer interfaces (BCIs).

Full text

Translate

Turn on search term navigation

Introduction

Brain-Computer Interfaces (BCIs) are revolutionary technologies that enable direct communication between the brain and external devices. They are especially helpful for people with mobility impairments. Among BCI types, EEG-based systems have gained much attention because they are non-invasive, portable, and operate in real time. These systems decode brain activity to control assistive devices like wheelchairs [1].

EEG records brain activity by detecting electrical signals produced by neuronal firing in the cerebral cortex through electrodes placed on the scalp. These electrodes capture voltage fluctuations reflecting brain waves across various frequency bands (e.g., delta, theta, alpha, beta, and gamma) [2].

BCIs face problems like noisy data, large feature sets, and class imbalance. The solution lies in treating BCI design as an optimization problem. The objective is to find the best combination of feature extraction, modeling, and preprocessing to minimize error and improve accuracy [3].

Recent research highlights that viewing BCIs as optimization problems helps balance model complexity and performance. This approach makes systems more robust against noise and imbalanced data. As a result, BCIs become more reliable and practical for real-world use.

However, EEG data is challenging to handle. It has high dimensionality, changes over time, and contains a lot of noise and artifacts. This makes classification very difficult [1]. This study seeks to overcome these issues by building a strong framework for identifying EEG data in order to enable precise and interpretable wheelchair navigation via mental task execution.

Recent improvements in EEG signal processing and ML have considerably increased the ability to decode brain activity for BCI. Deep learning models and ensemble approaches, for example, have exhibited extraordinary performance in capturing complicated patterns in EEG data, whereas Explainable AI (XAI) tools have improved these models’ interpretability [4, 5]. Despite these advances, noise, class imbalances, and feature redundancy continue to be significant obstacles to the development of reliable and generalizable EEG-based systems [6]. This study suggests a complete framework that incorporates cutting-edge preprocessing approaches, advanced FS algorithms, and a wide range of ML classifiers to increase the accuracy and interpretability of EEG data classification for wheelchair navigation.

The dataset used in this investigation [7], Wheelchair EEG Signals, consists of EEG recordings gathered during mental tasks linked with wheelchair navigation, such as envisioning forward, backward, left, and right motions. The dataset includes characteristics in both the temporal and frequency domains, which were extracted using statistical metrics and the Fast Fourier Transform (FFT), respectively. These features capture critical information from several frequency bands related to motor imaging tasks, making the dataset ideal for multi-class classification evaluations [8]. However, the existence of noise, class imbalances, and redundant features needs robust preprocessing and FS approaches to ensure dependable model performance [9].

This study assesses a variety of ML models, including ensemble methods like Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGB), Random Forest (RF), and Extra Trees, as well as individual models like Logistic Regression, Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Decision Trees (DT). These models’ performance is evaluated using Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves, which provide a thorough assessment of their capacity to manage the complicated and noisy character of EEG data [10]. Furthermore, advanced FS approaches, including correlation analysis, variance thresholding, and RFE, are used to minimize dimensionality while retaining significant data, improving classification accuracy [11].

The contributions of this study are threefold:

Preprocessing and Feature Engineering: To assure data quality and consistency, use strong preprocessing procedures such as normalization, standardization, and handling of missing values. Statistical metrics and FFT coefficients are used to derive key temporal and frequency-domain properties [12].
Machine Learning Evaluation: A thorough assessment of several classifiers, with a focus on ensemble methods, to determine the most successful ways for EEG signal classification. The study also uses SHAP (SHapley Additive exPlanations) analysis to assess model predictions and emphasize the importance of specific traits [13].
Feature selection and Interpretability: Advanced FS approaches are used to increase model performance and interpretability, providing insights into the underlying EEG features required for accurate categorization [14].

This research improves understanding of EEG-based classification systems and adds to the development of interpretable BCI technology by tackling issues such as unpredictability, noise, and complexity of features. The study’s findings are expected to drive future research into EEG-based assistive technology, particularly for people with severe motor limitations, opening the way for more efficient and user-friendly systems.

Related works

The classification of EEG signals has been a focal point of research in recent years, with applications spanning medical diagnostics, cognitive neuroscience, and assistive technologies. Recent studies, particularly those published after 2021, have made significant strides in addressing the challenges of noise, variability, and high dimensionality inherent in EEG data. This section reviews the latest advancements in preprocessing, FS, and ML techniques for EEG signal classification, with a focus on their application in assistive technologies such as wheelchair navigation.

Preprocessing techniques

Preprocessing is a critical step in EEG signal analysis, aimed at enhancing signal quality by mitigating noise and artifacts. Recent studies have emphasized the importance of normalization and standardization techniques to reduce inter-subject variability and improve model generalizability. For instance, [15] demonstrated the effectiveness of min–max scaling and z-score normalization in improving classification accuracy for EEG-based applications. Additionally, advanced artifact removal techniques, such as Independent Component Analysis (ICA) combined with deep learning filters, have been employed to address physiological and external noise [16]. These methods have proven effective in preserving the integrity of EEG signals, ensuring reliable downstream analysis.

Feature extraction and selection

Feature extraction and selection remain pivotal in EEG signal classification, as they directly impact model performance and interpretability. Recent research has explored hybrid techniques that combine time-domain, frequency-domain, and time–frequency-domain features to capture task-specific brain activity. For example, [17] highlighted the use of wavelet transforms integrated with statistical and spectral features to improve classification accuracy. Similarly, [18] demonstrated the effectiveness of FFT coefficients in capturing frequency-domain characteristics of EEG signals.

FS techniques have also evolved, with methods such as RFE, correlation analysis, and variance thresholding gaining prominence [19]. Showed that RFE, when combined with XAI methods, can prioritize features based on their contribution to model predictions, thereby enhancing interpretability. Additionally, mutual information-based feature ranking has emerged as a promising approach to reduce dimensionality while retaining critical information [20].

Machine learning models

ML algorithms, particularly ensemble methods and deep learning models, have demonstrated exceptional performance in EEG signal classification. Ensemble methods have gained traction due to their robustness in handling high-dimensional and noisy datasets. For instance, [21] reported that LGBM outperformed traditional classifiers in EEG-based cognitive applications, while [22] highlighted the efficiency of CatBoost in handling missing data and categorical features.

Deep learning models, including Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks, have also shown promise in capturing spatial–temporal patterns in EEG data [23]. Demonstrated the effectiveness of CNNs in decoding motor imagery tasks, while [24] explored the use of LSTM networks for real-time EEG classification. Hybrid models that integrate CNNs with attention mechanisms have further improved classification accuracy and interpretability [25].

XAI in EEG classification

The integration of XAI tools has become increasingly important in EEG-based BCIs, as they provide insights into model predictions and enhance transparency. SHAP and Integrated Gradients are among the most widely used XAI methods in EEG research [26]. Utilized SHAP analysis to identify critical features such as FFT coefficients and statistical measures, while [27] applied Integrated Gradients to interpret the behavior of deep learning models in EEG classification tasks. These tools not only improve interpretability but also facilitate the development of trustworthy and reliable BCI systems.

Challenges and future directions

Despite significant advancements, several challenges remain in EEG signal classification, particularly in the context of assistive technologies. Class imbalance, real-time processing limitations, and scalability for large-scale datasets are persistent issues that require innovative solutions. Recent studies have explored transfer learning and domain adaptation techniques to reduce the need for extensive labeled data, enabling cross-subject and cross-task generalization [28]. Additionally, the integration of real-time processing pipelines with edge computing frameworks has shown promise in improving the practicality of EEG-based BCIs [29].

Application in wheelchair navigation

EEG-based wheelchair navigation systems have garnered considerable attention in recent years, with researchers focusing on improving the accuracy and reliability of mental task classification [30]. Proposed a framework that integrates advanced preprocessing, FS, and ensemble learning models to achieve robust control of assistive devices. Similarly, [31] demonstrated the effectiveness of hybrid feature extraction techniques in distinguishing between mental tasks associated with wheelchair navigation. These studies underscore the potential of EEG-based BCIs in enhancing the quality of life for individuals with mobility impairments.

Research gaps and contributions

While existing research has made significant progress, gaps remain in optimizing FS, handling class imbalances, and improving the interpretability of EEG classification models. This study addresses these gaps by proposing a comprehensive framework that combines state-of-the-art preprocessing techniques, advanced FS methods, and ensemble learning models. By incorporating SHAP analysis, this research provides insights into the underlying EEG features critical for accurate classification, thereby advancing the development of reliable and interpretable EEG-based assistive technologies.

Methods-framework

Figure 1 depicts the whole technique of the suggested approach, including its several stages. The dataset used in this proposed Framework is Wheelchair EEG Signals © 2024 by Mind Mobilizers, licensed under Attribution-NonCommercial-ShareAlike 4.0 International [5] (Table 1).

[See PDF for image]

Fig. 1

EEG Classification Framework

Table 1. Comparison among recent related work Utilizing EEG signal classification

Refs.	Methodology	Focus area	Key contribution
[15]	Neural Networks, Stratified Normalization	Emotion Recognition from EEG Signals	Proposed stratified normalization to reduce inter-subject variability in emotion recognition
[16]	ICA-based artifact removal, Deep Learning	Artifact Removal in EEG, BCI Applications	Investigated the limitations of ICA-based artifact removal in deep learning models for BCI tasks
[17]	Hybrid deep learning models, Parallel feature extraction	Emotion State Classification using EEG	Introduced hybrid deep models for enhanced emotion state classification with parallel feature extraction
[18]	FFT and Wavelet Analysis for feature extraction, ML	Mental Stress Level Classification using EEG	Utilized FFT and Wavelet Analysis for improved mental stress classification accuracy
[19]	Ensemble models with XAI, Predictive analysis	Chronic Kidney Disease Prediction with XAI	Explored XAI methods in ensemble models for chronic kidney disease prediction
[20]	Mutual Information FS, Intrusion Detection Systems	Intrusion Detection Systems in IoMT	Developed mutual information-based FS for improved IDS in IoMT
[21]	VMD and LGBM classifier for cognitive load detection	Cognitive Load Detection using EEG	Used VMD and LGBM to detect cognitive load with high accuracy in EEG data
[22]	Wavelet Feature Extraction, Gradient Boosting DT	Abnormal EEG Signal Detection	Developed a gradient boosting DT model for detecting abnormal EEG signals
[23]	Merged CNNs for EEG motor imagery signal classification	Motor Imagery Signal Classification using CNN	Proposed merged CNNs for motor imagery signal classification to improve performance
[24]	LSTM network for prediction, EEG spectral features	Epileptic Seizure Prediction using EEG	Implemented two-layer LSTM networks for predicting epileptic seizures from EEG signals
[25]	Hybrid CNN with Attention Mechanism	EEG Classification using CNN with Attention Mechanism	Developed hybrid CNN with attention mechanism for better EEG classification
[26]	SHAP-based ERP analysis, XAI methods	ERP Analysis, XAI in EEG	Introduced SHAP-based ERP analysis to enhance EEG sensitivity using XAI
[27]	Explainable ML, EEG-based Brain-Computer Interface	Explainable ML in BCI	Proposed an explainable ML approach for EEG-based BCIs
[28]	Transfer learning, EEG motor imagery classification	Cross-Subject Motor Imagery Classification	Used transfer learning for cross-subject EEG motor imagery classification
[29]	Domain Adaptation Algorithms, EEG Classification	EEG Classification with Domain Adaptation Algorithms	Developed domain adaptation algorithms for multi-source cross-subject EEG classification
[30]	Coresets for Real-time EEG classification, BCI	Real-Time EEG Classification for BCI Applications	Utilized coresets for real-time EEG classification in BCI applications
[31]	ML techniques for BCI	BCI System Development using ML	Reviewed ML techniques for EEG-based BCI systems

Dataset description

The dataset used in this study includes EEG signals recorded during certain mental activities related to wheelchair navigation. EEG signals were recorded from subjects visualizing motions in backward, right, left, and forward directions. These signals were analyzed and divided into fixed-length segments using the sliding window approach. Each segment was then processed to extract a variety of data, such as statistical metrics, FFT coefficients for different frequency bands, and a label indicating the task associated with that segment. Table 2 shows the primary features derived from each window.

Table 2. Dataset Features

Feature	Description	Feature	Description
Start Timestamp	The starting timestamp of the window	Skewness	Skewness value of the signal within the window
End Timestamp	The ending timestamp of the window	Peak-to-Peak	Peak-to-peak amplitude of the signal within the window
FFT Result	Fast Fourier Transform (FFT) results from the signal within the window	Abs Diff Signal	Absolute difference of the signal within the window
Mean	Mean value of the signal within the window	Alpha Power	Power in the alpha band of the signal within the window
Max	Maximum value of the signal within the window	Beta Power	Power in the beta band of the signal within the window
Standard Deviation	Standard deviation of the signal within the window	Gamma Power	Power in the gamma band of the signal within the window
RMS	Root Mean Square (RMS) value of the signal within the window	Delta Power	Power in the delta band of the signal within the window
Kurtosis	Kurtosis values the signal within the window	Theta Power	Power in the theta band of the signal within the window
Label	The corresponding label for the window. 0 for Backward, 1 for Left, 2 for Right, 3 for Forward

The data consists of 2869 samples and 141 columns that indicate EEG characteristics. The key aspects are as follows:

Time-based data: Columns such as Start Timestamp and End Timestamp provide time information for each EEG segment.
Statistical Features: Columns such as Mean, Max, Standard Deviation, RMS, Kurtosis, and Skewness.
Frequency Domain Features: Columns FFT_0 through FFT_124, representing FFT coefficients, likely used to capture frequency information from EEG signals.

Target Variable (Label): The target variable has four classes, which are distributed as follows: Class 0 (26.49%), Class 1 (25.72%), Class 2 (23.84%), and Class 3 (23.95%). This suggests a well-balanced distribution across classes.

Dataset pre-processing stage

Due to noise, artifacts, and missing data in EEG signal analysis, robust dataset processing is required [15]. Missing data, commonly caused by sensor failure or artifact rejection, is addressed using techniques such as KNN imputation or deep learning-based reconstruction to preserve signal integrity [32, 33]. Normalization reduces inter-subject variability by scaling signals to a predetermined range (for example, [0, 1]), whereas standardization ensures zero mean and unit variance for comparability [18, 34]. These procedures improve feature extraction and ML performance, resulting in dependable EEG-based applications such as wheelchair navigation [30]. Handling high dimensionality and real-time processing are two challenges that require customized preprocessing pipelines [35].

Figure 2 depicts the two important preprocessing methods applied to the dataset in this research: Handling Missing Data and Normalization/Standardization.

Handling Missing Data
The dataset used in this study contained no missing values, which simplified the preprocessing pipeline and eliminated the need for imputation techniques commonly employed in EEG analysis.
Normalization and Standardization
Feature scaling is crucial for ensuring balanced contributions from all features, improving model performance and stability, and is achieved through techniques like normalization and standardization [18, 34].
Normalization approaches include min–max scaling, which scales features between 0 and 1, as shown in Eq. 1:
$X_{scaled} = \frac{X - X_{\min}}{X_{\max} - X_{\min}}$ (1)
where:
[See PDF for image]
Fig. 2
PRE-PROCESSING STAGE
- X_scaled: is the scaled value of the original data point.
- X: is the original data point.
- X_min: is the minimum value in the dataset.
- X_max: is the maximum value in the dataset.

This equation converts the original X values so that the dataset’s smallest value becomes zero and the maximum value becomes one, with all other values scaled proportionally in between.

Standardization ensures consistency across features in data, enhancing algorithm efficiency and reducing bias, especially in time-series and frequency-based datasets like EEG, where amplitude and frequency measurements may vary significantly.

Features selection stage (FSS): importance and methodology

FS is a critical step in building effective ML models, as it improves performance, reduces dimensionality and complexity, and mitigates overfitting by identifying and retaining the most relevant features. Before applying FS, highly correlated features are removed to avoid multicollinearity, which can distort model interpretability and performance [15]. Figure 3 illustrates the procedures used for feature reduction and selection in this study.

[See PDF for image]

Fig. 3

FSS

Correlation Analysis: evaluates the linear relationship between variables, identifying redundant features that contribute minimal unique information to the model [33]. Highly correlated features can introduce multicollinearity, distorting regression coefficients and reducing predictive performance [36].
To address this, features with correlation values above a predefined threshold are removed. Figure 4 shows the pseudo-code and flowchart for feature reduction using CT. This process simplifies the dataset, improves computational efficiency, and reduces overfitting risks. Additionally, correlation analysis highlights features with strong relationships to the target variable, ensuring their prioritization in model development. Integrating domain knowledge with correlation-based selection enhances model accuracy and interpretability [37].
[See PDF for image]
Fig. 4
Features Reduction Flowchart
Figures 5 and 6 show an example of correlation between some dataset features used in the proposal. Figure 7 shows the feature reduction due to different CT values.
[See PDF for image]
Fig. 5
FFT_4 and Field: FFT_121 appear highly correlated

[See PDF for image]
Fig. 6
FFT_96 and FFT_29 appear highly correlated

[See PDF for image]
Fig. 7
No of features versus CT
Variance Threshold (VT): removes features with low variance, as they provide minimal discriminatory power. Features with little variation across samples are eliminated using a user-defined threshold $θ$ , retaining only those with $V a r (x_{j})$ > $θ$ . VT is computationally efficient and scales well for high-dimensional datasets but does not consider feature-target relationships, often requiring combination with other methods [33]. Mathematically, for a feature $x_{j}$ in the dataset, the variance is calculated as in Eq. 2:
2
$V a r (x_{j}) = \frac{1}{n} \sum_{i = 1}^{n} {(x_{ij} - μ_{j})}^{2}$ where $n$ is the number of samples, $x_{ij}$ represents the $i$ -th sample of feature $x_{j}$ , and $μ_{j}$ is the mean of $x_{j}$ .
Recursive Feature Elimination (RFE): Recursively removes the least important features to build a model with the most relevant features, improving performance and interpretability [20]. As illustrated in Fig. 8, The process involves:
1. Training the model on the dataset.
2. Ranking features by importance.
3. Removing the least important features.
4. Repeating until the desired number of features k is reached.
[See PDF for image]
Fig. 8
RFE Flowchart

In this study, FS process followed a sequential multi-step approach. First, correlation filtering was applied to remove highly correlated features (above a defined threshold), thereby mitigating multicollinearity and improving model interpretability. Next, VT was evaluated using thresholds ranging from 0 to 0.25 to identify low-variance features but was ultimately excluded as all features exhibited sufficient variability. Finally, RFE was applied to the reduced set of features to select the most informative subset based on model importance scores. This layered approach ensured dimensionality reduction through redundancy elimination (correlation), statistical irrelevance (variance), and model-informed selection (RFE), thereby improving generalization while maintaining interpretability.

Once the features are selected, we proceed with choosing ML classifiers to test. Steps for Classification with Reduced Data:

Split Data: The dataset was randomly divided into three subsets: 70% for training, 15% for validation, and 15% for testing. This stratified splitting ensured class distribution remained balanced across all subsets.
Choose Classification Models: In this research, a range of classification algorithms were employed to improve accuracy and generalizability across complex EEG data classifications.
Train and Evaluate: Fit each model on the training set, then evaluate with metrics like accuracy, precision, recall, and F1-score [38].

Software tools, programming libraries, and computational hardware

To conduct the analysis in this study, we utilized a combination of software tools, programming libraries, and computational hardware. The primary programming language used was Python, chosen for its extensive ecosystem of libraries suited for data analysis and ML tasks.

The following programming libraries were employed:

Scikit-learn was used for implementing ML models (e.g., Extra Trees, RF, LGBM), as well as for preprocessing tasks like feature scaling and imputation.
Pandas and NumPy were utilized for data manipulation, cleaning, and numerical computations.
Matplotlib was employed for data visualization, including the generation of ROC curves, precision-recall curves, and confusion matrices.
SHAP was used for XAI analysis to interpret feature importance and enhance model transparency.

The analysis was conducted on a local machine equipped with an Intel Core i7 processor, 16 GB of RAM. For some exploratory tests and larger model training, Google Colab was also used to leverage cloud-based computational resources, providing access to GPUs for faster processing.

Results and discussion

This section presents the outcomes of experiments, assessing EEG classification models using Accuracy, Precision, Recall, F1 Score, and AUC-ROC.

Accuracy: Measures the proportion of correct predictions but can be misleading in imbalanced datasets [39].

A c c u r a c y = \frac{N u m b e r o f C o r r e c t P r e d i c t i o n s}{T o t a l N u m b e r o f P r e d i c t i o n s}

Precision: Assesses the proportion of correctly predicted positive cases, important in applications where false positives must be minimized [41].

P r e c i s i o n = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e P o s i t i v e s}

Recall (Sensitivity): Evaluates the model’s ability to detect actual positives, crucial for high-risk applications [41].

R e c a l l = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e N e g a t i v e s}

F1 Score: A harmonic mean of precision and recall, balancing both metrics for imbalanced datasets [40].

F 1 = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

AUC-ROC (Area Under the Receiver Operating Characteristic Curve): Measures class separability, with values closer to 1 indicating better performance [39].

Classification performance on raw dataset

Figure 9 and Table 3 compare the performance of various classification algorithms on a raw dataset. LGBM and XGB achieve the highest accuracy (63 and 61%, respectively) and AUC ROC scores (0.85), demonstrating the effectiveness of gradient boosting methods. Extra Trees (ET) and RF also perform well, with accuracy values of 60 and 59% and AUC ROC scores of 0.84, showcasing the strength of tree-based ensembles.

[See PDF for image]

Fig. 9

Classification performane with raw dataset

Table 3. Classifiers’ performance on raw dataset

Model	Accuracy	Precision	Recall	F1 Score	AUC ROC
Extra Trees	0.60	0.61	0.60	0.60	0.84
LR	0.46	0.47	0.46	0.46	0.72
KNN	0.53	0.55	0.53	0.53	0.78
DT	0.52	0.52	0.52	0.52	0.68
RF	0.59	0.59	0.59	0.59	0.84
SVM	0.50	0.55	0.49	0.49	0.74
AdaBoost	0.44	0.44	0.44	0.43	0.73
Bagging	0.57	0.58	0.57	0.57	0.80
LGBM	0.63	0.63	0.63	0.63	0.85
XGB	0.61	0.61	0.61	0.61	0.85

In contrast, simpler models like AdaBoost (accuracy: 44%, AUC ROC: 0.73) and LR (accuracy: 46%, AUC ROC: 0.72) struggle, likely due to the dataset’s complexity and non-linear relationships. KNN, Support Vector Classifier (SVC), and Bagging show moderate performance, with accuracy values of 53, 50, and 57%, respectively.

These results emphasize that gradient boosting methods (LGBM, XGB) are best suited for this dataset, while simpler models are less effective. The analysis underscores the importance of algorithm selection for achieving optimal classification performance.

Impact of CT and RFE on classifier performance

This section investigates the relationship between CTs and the number of features selected by RFE for the top-performing classifiers: LGBM, XGB, RF, and Extra Trees. The analysis aims to evaluate how varying CT and RFE settings influence classification accuracy, providing insights into the optimal balance between FS and correlation filtering. Figures 10, 11, 12, and 13 illustrate the impact of these parameters on the performance of each classifier.

[See PDF for image]

Fig. 10

CT Vs RFE No. with LGBMClassifier

[See PDF for image]

Fig. 11

CT Vs RFE No. with XGB Classifier

[See PDF for image]

Fig. 12

CT Vs RFE No. with RF Classifier

[See PDF for image]

Fig. 13

CT Vs RFE No. with ExtraTreesClassifier

LGBM achieves its highest accuracy (53–63%) with moderate CTs (CT-0.95) and a feature count of 13–15 as determined by RFE. However, accuracy declines when the number of features is excessively reduced (e.g., RFE-10) or when stricter CTs (e.g., CT-0.99) are applied. This suggests that retaining a sufficient number of features and avoiding overly stringent correlation filtering are critical for maintaining optimal performance.

XGB demonstrates optimal accuracy (54–64%) when CTs are set between CT-0.90 and CT-0.95, combined with a higher retention of features (RFE-13 to RFE-17). Similar to LGBM, performance deteriorates when the number of features is significantly reduced (e.g., RFE-5) or when stricter CT values (e.g., CT-0.99) are used. This indicates that XGB benefits from a balanced approach to FS and correlation filtering.

RF achieves its peak accuracy (57–66%) with moderate CTs (CT-0.90 to CT-0.95) and a feature count of 13–15 as determined by RFE. Accuracy declines when fewer features are retained (e.g., RFE-10) or when stricter CT values are applied. These results highlight the importance of maintaining a moderate number of features and avoiding overly restrictive CTs for optimal performance.

Extra Trees exhibits similar trends to RF, with optimal accuracy observed at RFE-10 to RFE-13 and CT-0.95. Performance declines when fewer features are retained or when stricter CT values are applied, reinforcing the need for a balanced approach to FS and correlation filtering.

The findings underscore the importance of balancing FS and filtering correlation to optimize classifier performance. Moderate CTs (CT-0.90 to CT-0.95) and retaining a sufficient number of features (RFE-10 to RFE-17) are critical for achieving peak accuracy across all models. Excessive feature elimination or overly stringent CTs can degrade performance, highlighting the need for careful parameter tuning in FS pipelines. These insights provide valuable guidance for optimizing EEG-based classification tasks, particularly in applications such as wheelchair navigation, where model accuracy and interpretability are paramount.

Table 4 lists the maximum accuracy achieved by different classifiers along with their optimal CT and the number of features selected by RFE. Extra Trees achieved the highest accuracy (69%) with a CT of 0.99 and 13 selected features.

Table 4. Maximum Accuracy by Classifier with Optimal CT and RFE Settings

Model	Max Accuracy (%)	Optimal CT	RFE Features
Extra Trees	69	0.99	13
RF	66	0.99	10
LGBM	63	0.95	13
XGB	64	0.90	13

Figures 14 and 15 illustrate the impact of feature selection using CT and RFE on the performance of the Extra Trees classifier. In Fig. 14, accuracy improves from 55% at CT = 0.75 to a peak of 65% at CT = 0.95, highlighting the benefit of removing highly correlated features to reduce redundancy and multicollinearity. Accuracy then declines to 58% at CT = 1.0, indicating that retaining excessive correlations degrades performance. An optimal CT lies between 0.93 and 0.95.

[See PDF for image]

Fig. 14

CT Vs Accuracy

[See PDF for image]

Fig. 15

RFE Vs Accuracy

Figure 15 shows that reducing the number of features from 55 to 25 via RFE increases accuracy from 64% to a maximum of 68%. This suggests that eliminating noisy or irrelevant features enhances generalization. However, further reduction beyond 25 features leads to a sharp drop in performance, reaching 44% with only 2 features. This decline indicates that excessive feature removal leads to significant information loss, reducing model effectiveness.

Performance comparison after features selection

The chart in Fig. 16 compares the accuracy of various classification algorithms after applying FS to the dataset, using a CT of 0.99 and RFE selecting 13 features. These parameters were chosen because CT = 0.99 effectively removes highly correlated features, reducing redundancy without losing critical information, while RFE with 13 features was identified as the optimal number that maximizes model performance. This combination resulted in the highest classification accuracy, particularly with the Extra Trees classifier as shown in Table 4. The final set of selected features includes Start Timestamp, End Timestamp, Max, Standard Deviation, RMS, Kurtosis, Skewness, Peak-to-Peak, Abs Diff Signal, Alpha Power, Beta Power, Delta Power, and FFT_26.

[See PDF for image]

Fig. 16

Classification Performane after Features Election

The results demonstrate a significant improvement in performance for some models compared to their accuracy on the raw dataset. The Extra Trees classifier leads with the highest accuracy (69%), followed by RF (66%), showcasing the strength of tree-based ensembles with FS. XGB (64%) and LGBM (63%) also perform robustly, while Bagging achieves moderate accuracy (57%). LR (40%) has the lowest accuracy, reflecting its inability to model complex relationships effectively. SVC (45%), AdaBoost (46%), DT (56%), and KNN (52%) show modest improvements. These results emphasize the benefits of FS for ensemble models, while simpler models remain less effective with a refined feature set.

Table 5 presents the performance metrics of various ML models across five evaluation criteria: accuracy, precision, recall, F1 score, and AUC ROC. The performance metrics highlight the dominance of ensemble models, with Extra Trees achieving the best results (accuracy, precision, recall, F1: 0.69; AUC ROC: 0.89). RF (AUC ROC: 0.87) and LGBM (AUC ROC: 0.86) also perform strongly with balanced metrics. Mid-range models like KNN (AUC ROC: 0.77), Bagging (AUC ROC: 0.82), and Decision Trees (AUC ROC: 0.71) show moderate performance.

Table 5. Classifiers’ performance on dataset after FS

Model	Accuracy	Precision	Recall	F1 Score	AUC ROC
Extra Trees	0.69	0.69	0.69	0.69	0.89
LR	0.40	0.40	0.39	0.37	0.66
KNN	0.52	0.53	0.52	0.53	0.77
DT	0.56	0.57	0.56	0.56	0.71
RF	0.66	0.66	0.66	0.66	0.87
SVM	0.46	0.0.49	0.46	0.44	0.71
AdaBoost	0.45	0.45	0.45	0.45	0.75
Bagging	0.57	0.57	0.57	0.57	0.82
LGBM	0.63	0.63	0.63	0.63	0.86
XGB	0.64	0.61	0.61	0.64	0.86

In contrast, simpler models struggle. LR has the lowest metrics (accuracy: 0.40, F1: 0.37, AUC ROC: 0.66), and SVM and AdaBoost lag with limited class separation. Ensemble methods clearly outperform, offering superior PR balance and class distinction (Table 6).

Table 6. Cross-Model Comparison

Metric	ExtraTrees	RF	LGBM	Overall Insights
Class 0 (ROC)	0.88	0.87	0.85	Extra Trees leads slightly; LGBM falls behind in precision
Class 0 (PR)	0.79	0.76	0.74	Extra Trees leads slightly; LGBM falls behind in precision
Class 1 (ROC)	0.89	0.86	0.83	Strong for Extra Trees; LGBM struggles more than RF
Class 1 (PR)	0.78	0.73	0.68	Strong for Extra Trees; LGBM struggles more than RF
Class 2 (ROC)	0.86	0.85	0.85	Balanced ROC across models; Extra Trees and RF better in PR
Class 2 (PR)	0.74	0.68	0.66	Balanced ROC across models; Extra Trees and RF better in PR
Class 3 (ROC)	0.92	0.91	0.92	Outstanding performance across all models; Extra Trees has a slight edge
Class 3 (PR)	0.82	0.79	0.81

Performance evaluation using ROC and PR curves on dataset after FS

This section evaluates the performance of the ExtraTrees, RF, and LGBM classifiers on the proposed dataset after the FS process. The evaluation focuses on two key metrics: ROC curves and PR curves. These metrics provide insights into the classifiers’ ability to distinguish between classes and their predictive accuracy under different thresholds. The ROC curve measures the trade-off between the True Positive Rate (TPR) and False Positive Rate (FPR), while the PR curve evaluates the balance between Precision (positive predictive value) and Recall (sensitivity). The AUC for both metrics quantifies the model’s performance, with higher values indicating better classification ability.

F P R = \frac{F a l s e P o s i t i v e s}{F a l s e P o s i t i v e s + T r u e N e g a t i v e s}

T P R = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e N e g a t i v e s}

As depicted in Fig. 17, the ExtraTrees classifier demonstrates strong discriminatory capabilities across all classes. It excels particularly for Class 3, achieving a ROC AUC of 0.92 and a PR AUC of 0.82, highlighting its ability to effectively distinguish this class while maintaining a balance between precision and recall. For Classes 0 and 1, the classifier achieves ROC AUC values of 0.88 and 0.89, respectively, with corresponding PR AUC values of 0.79 and 0.78, reflecting consistent performance. However, Class 2 presents a challenge, with a slightly lower ROC AUC of 0.86 and a PR AUC of 0.74, indicating difficulties in achieving PR balance, likely due to overlapping features with other classes.

[See PDF for image]

Fig. 17

ExtraTrees ROC-PR Curves

As shown in Fig. 18, the RF classifier also demonstrates robust performance. It achieves a high ROC AUC of 0.91 for Class 3 and a PR AUC of 0.79, underscoring its ability to accurately separate this class from others. For Classes 0, 1, and 2, the classifier exhibits strong ROC AUC values of 0.87, 0.86, and 0.85, respectively, confirming reliable classification across these groups. However, the PR AUC values for these classes—0.76 for Class 0, 0.73 for Class 1, and 0.68 for Class 2—reveal challenges in maintaining PR trade-offs, particularly for Class 2. This suggests that the model struggles with predictive accuracy for some classes when recall is low.

[See PDF for image]

Fig. 18

Random Forest ROC-PR Curves

Similarly, as shown in Fig. 19, the LGBM classifier performs effectively, achieving the highest ROC AUC of 0.92 for Class 3 and a PR AUC of 0.81, indicating excellent separability and balanced PR performance for this class. For Classes 0 and 2, the classifier achieves equal ROC AUC values of 0.85, while Class 1 has a slightly lower ROC AUC of 0.83, demonstrating consistent performance in separating these classes. However, the PR AUC values for Classes 0, 1, and 2–0.74, 0.68, and 0.66, respectively—suggest difficulties in balancing precision and recall, particularly for Class 2.

[See PDF for image]

Fig. 19

LightGBM ROC-PR Curves

In comparison, all three models perform best for Class 3, consistently achieving high ROC and PR AUC values. ExtraTrees leads in overall PR performance, particularly for Classes 0 and 1, while RF maintains strong and consistent performance across most metrics. LGBM excels in ROC AUC for Class 3 but exhibits lower PR AUC values for Classes 1 and 2, indicating challenges in PR performance for these classes. Across all models, Class 2 remains the most challenging, likely due to overlapping features with other classes. Further investigation into feature importance and targeted model adjustments may help address these issues.

Dataset visualization

Principal Component Analysis (PCA) was applied to reduce the dataset to two principal components, maximizing variance while simplifying visualization [42]. As depicted in Fig. 20 three views were generated: full dataset, inliers, and outliers. The full dataset reveals dense clusters near the origin with widely dispersed outliers, highlighting classification challenges due to overlapping regions. The inlier visualization, excluding outliers, shows tighter clusters and reduced variance, but some overlap remains, suggesting outlier removal may eliminate critical boundary information. The outlier visualization demonstrates their broad distribution and role in refining decision boundaries, improving generalization and classification accuracy.

[See PDF for image]

Fig. 20

Dataset, Inliers, and Outliers visualization using PCA

To further examine class separability, t-SNE (t-Distributed Stochastic Neighbor Embedding) was applied post-FS [43]. In Fig. 21 The first visualization, including all data, reveals overall class distribution and overlaps. The second, isolating inliers (via IQR method), highlights core structure with denser clusters, though overlaps persist. The third, focusing on outliers, shows their importance in capturing rare patterns and improving classification performance. These findings reinforce the value of preserving key outliers while optimizing FS for robust EEG classification.

[See PDF for image]

Fig. 21

Dataset, Inliers, and Outliers visualization using t-SNE

Outliers are often treated as noise and removed in data processing. However, recent studies reveal they can actually improve model performance by helping define class boundaries. This idea, known as boundary-sample learning, shows that outliers enhance decision boundaries, enabling classifiers to generalize better in complex data settings. Furthermore, adversarial robustness research highlights that challenging samples, including outliers, help build stronger models. Training on such difficult or anomalous data forces models to adapt to irregularities, improving their resilience to unseen variations. This is especially important in dynamic and noisy domains like EEG signal classification. Studies like [44] and [45] demonstrate how adversarial examples—similar to outliers—boost model robustness and generalization by pushing models to handle data irregularities effectively.

Impact of inliers versus outliers on classification performance

Figure 22 and Table 7 illustrate the impact of classification performance when evaluated separately on inliers and outliers. Here, inliers refer to the subset of normal (non-anomalous) data points after removing all outliers, while outliers denote the anomalous data points after excluding inliers. This separation enables an isolated analysis of model behavior on typical versus anomalous data distributions, which is crucial for assessing model robustness and effectiveness.

[See PDF for image]

Fig. 22

Classification performane with Inliers Vs Outliers

Table 7. Classifiers’ performance on dataset inliers Vs Outliers

Model	Accuracy		Precision		Recall		F1 Score		AUC ROC
Model	Inliers	Outliers	Inliers	Outliers	Inliers	Outliers	Inliers	Outliers	Inliers	Outliers
Extra Trees	0.53	0.82	0.52	0.82	0.52	0.82	0.52	0.82	0.79	0.94
LR	0.41	0.45	0.41	0.45	0.39	0.45	0.39	0.45	0.68	0.72
KNN	0.39	0.63	0.4	0.64	0.38	0.63	0.38	0.63	0.67	0.85
DT	0.41	0.64	0.41	0.64	0.41	0.64	0.40	0.64	0.60	0.76
RF	0.53	0.72	0.52	0.72	0.52	0.72	0.52	0.72	0.77	0.92
SVM	0.43	0.54	0.45	0.56	0.40	0.54	0.36	0.53	0.70	0.79
AdaBoost	0.39	0.55	0.38	0.55	0.38	0.55	0.38	0.54	0.68	0.74
Bagging	0.46	0.69	0.47	0.69	0.45	0.69	0.46	0.69	0.72	0.89
LGBM	0.52	0.75	0.52	0.75	0.51	0.75	0.51	0.75	0.76	0.92
XGB	0.50	0.72	0.49	0.73	0.49	0.72	0.49	0.72	0.76	0.92

The results, summarized in Table 7 and shown in Fig. 22, reveal a consistent trend across all classifiers: performance metrics are significantly higher when tested on outliers compared to inliers. For example, the Extra Trees classifier achieved 82% accuracy on outliers, surpassing its 53% accuracy on inliers. Similarly, RF and LGBM models improved from around 52–53% accuracy on inliers to 72–75% on outliers. This pattern is reflected in precision, recall, F1 score, and AUC-ROC metrics, with AUC-ROC often exceeding 0.90 on outliers, indicating excellent discriminatory power for anomalous samples.

The lower accuracy on inliers (Fig. 22) suggests that outliers contain valuable information critical for class separation. Table 5 confirms this, showing declines in precision, recall, and F1 scores across all models after removing outliers. Models relying on strict decision boundaries, such as LR and KNN, were most affected, though ensemble methods also experienced measurable performance drops.

Interestingly, some models performed better when trained and tested exclusively on outliers after removing inliers. This happens because outliers, though rare, often carry essential information for classification, especially in domains like fraud and anomaly detection. Removing inliers reduces feature-space overlap, forcing the model to focus on the most informative and challenging samples. This strategy is effective in imbalanced datasets, where retaining outliers counters majority-class bias and enhances minority-class detection. Ensemble methods excel at leveraging this outlier-specific information due to their ability to model complex, nonlinear data distributions (Tables 8 and 9).

Table 8. Cross-Model Comparison over Inliers

Metric	ExtraTrees	RF	LGBM	Overall Insights
Class 0 (ROC)	0.74	0.74	0.73	Strong performance for Class 3; struggles with precision for Class 1 and 2
Class 0 (PR)	0.59	0.56	0.53
Class 1 (ROC)	0.73	0.73	0.74	Good ROC performance overall; slightly weaker PR for Class 1 and 0
Class 1 (PR)	0.50	0.48	0.47
Class 2 (ROC)	0.77	0.77	0.74	Balanced performance for Class 3; moderate for Class 1 and 0; weakest for Class 2
Class 2 (PR)	0.51	0.51	0.43
Class 3 (ROC)	0.84	0.84	0.84	Overall strong for Class 3; balanced across classifiers
Class 3 (PR)	0.68	0.66	0.67	Overall strong for Class 3; balanced across classifiers

Table 9. Cross-Model Comparison over Outliers

Metric	ExtraTrees	RF	LGBM	Overall Insights
Class 0 (ROC)	0.92	0.85	0.91	Class 0: ExtraTrees and RF lead in ROC; LGBM slightly lags in PR
Class 0 (PR)	0.85	0.83	0.84
Class 1 (ROC)	0.94	0.86	0.93	Class 1: ExtraTrees excels in both metrics; RF shows moderate performance
Class 1 (PR)	0.86	0.83	0.84
Class 2 (ROC)	0.91	0.83	0.90	Class 2: All models struggle with PR; RF has the lowest scores
Class 2 (PR)	0.83	0.78	0.78
Class 3 (ROC)	0.95	0.92	0.94	Class 3: Strong performance across models; ExtraTrees slightly edges out others
Class 3 (PR)	0.89	0.85	0.88

This analysis highlights the trade-off between removing and retaining outliers. While exclusion simplifies the dataset and may reduce noise, it risks discarding critical information defining class boundaries, thereby impairing performance.

These findings suggest classifiers are better at identifying anomalous data than modeling normal data’s underlying structure, which may be more complex or noisy. Therefore, it is important to evaluate model performance separately on inliers and outliers, especially for anomaly detection tasks. The superior performance on outliers implies these classifiers are effective in anomaly detection but should be interpreted with caution when relying solely on inlier data due to lower predictive accuracy.

In summary, this comprehensive analysis underscores the need to carefully assess the impact of outlier removal. Ensemble methods, particularly Extra Trees and RF, show superior robustness in handling both inliers and outliers, making them preferred choices for classification tasks with complex and imbalanced EEG datasets.

Performance evaluation using ROC and PR curves on dataset inliers

As depicted in Fig. 23 the Extra Trees classifier achieves the highest ROC AUC for Class 3 (0.85), alongside a PR AUC of 0.68, showcasing its superior performance for this class. Moderate results are observed for Classes 0 and 2 (ROC AUC: 0.76, PR AUC: 0.59 and 0.51, respectively), while Class 1 demonstrates the weakest results (ROC AUC: 0.74, PR AUC: 0.50), suggesting difficulties in feature representation and class separation.

[See PDF for image]

Fig. 23

ExtraTrees ROC-PR Curves over Inliers

Similarly, Fig. 24 shows that RF classifier performs best for Class 3 (ROC AUC: 0.84, PR AUC: 0.66) but struggles with Class 1 (ROC AUC: 0.73, PR AUC: 0.48). This pattern suggests that while the model excels in precision and recall for well-represented classes, it is less effective for overlapping or poorly represented ones.

[See PDF for image]

Fig. 24

Random Forest ROC-PR Curves over Inliers

Figure 25 illustrates that for the LGBM classifier, Class 3 also stands out with a PR AUC of 0.67, though Classes 0, 1, and 2 show weaker performance (PR AUCs: 0.53, 0.47, and 0.43, respectively). The ROC AUC values for Classes 1 and 2 (0.74 each) are slightly higher than Class 0 (0.73), indicating moderate performance for these classes.

[See PDF for image]

Fig. 25

LightGBM ROC-PR Curves over Inliers

The results consistently highlight Class 3 as the best-performing class across all models, particularly with the Extra Trees classifier. However, Class 1 emerges as the most challenging to classify, with the lowest AUC values across both ROC and PR metrics.

The observed imbalance in classification performance across different classes can be attributed to several key factors related to data distribution and intrinsic class characteristics. Within the inlier subset, Class 1 samples may exhibit substantial overlap with other classes, resulting in poor discriminative features. This overlap leads to increased ambiguity, thereby challenging the classifier’s ability to accurately distinguish Class 1 from similar or adjacent classes. Additionally, inlier data for Class 1 may contain noisy and less distinct patterns, further impairing model learning and generalization.

Performance evaluation using ROC and PR curves on dataset outliers

Figure 26 shows that ExtraTrees classifier achieves the highest ROC AUC for Class 3 (0.95), followed by Class 1 (0.94), Class 0 (0.92), and Class 2 (0.91). Similarly, PR AUC values show a descending trend with Class 3 scoring 0.89, Class 1 at 0.86, Class 0 at 0.85, and Class 2 at 0.83. These results underscore the classifier’s strength in distinguishing outliers, particularly in Class 3, while revealing slight difficulties with PR balance for Class 2.

[See PDF for image]

Fig. 26

ExtraTrees ROC-PR Curves over Outliers

Figure 27 shows that RF classifier performs similarly, achieving ROC AUC values of 0.92 for Classes 0, 1, and 3, and a slightly lower value of 0.89 for Class 2. PR AUC values mirror this pattern, with Class 3 at 0.85, Classes 0 and 1 at 0.83, and Class 2 at 0.78. This indicates the model’s effectiveness across most classes, though Class 2 remains a challenge.

[See PDF for image]

Fig. 27

Random Forest ROC-PR Curves over Outliers

Figure 28 illustrates that LGBM classifier demonstrates strong ROC AUC results, with Class 3 scoring 0.94, Class 1 at 0.93, Class 0 at 0.91, and Class 2 at 0.90. PR AUC values are highest for Class 3 (0.88), followed by Classes 0 and 1 (both 0.84), with Class 2 again showing the lowest value (0.78). This reflects LGBM’s capability in handling outliers while indicating room for improvement in managing Class 2.

[See PDF for image]

Fig. 28

LightGBM ROC-PR Curves over Outliers

All three classifiers exhibit robust performance in distinguishing outliers, with Class 3 emerging as the strongest-performing class across all metrics. However, the consistently lower PR AUC for Class 2 suggests a need for further optimization to enhance precision and recall. These findings emphasize the classifiers’ strengths in handling outliers and provide direction for improving performance in challenging class distributions.

XAI: SHAP analysis

Visuals in Figs. 29, 30, 31, 32 use SHAP to analyze the ExtraTrees classifier’s predictions across the four dataset’s classes (0, 1, 2, and 3). Each bar chart highlights the average SHAP values for features, showing their importance for each class. Each beeswarm plot highlights the distribution of SHAP values for features and their respective contributions to model outputs. Each waterfall plot shows how individual features influence the model’s decision-making for the specific class.

[See PDF for image]

Fig. 29

SHAP Analysis for Class 0

[See PDF for image]

Fig. 30

SHAP Analysis for Class 1

[See PDF for image]

Fig. 31

SHAP Analysis for Class 2

[See PDF for image]

Fig. 32

SHAP Analysis for Class 3

For Class 0, FFT_26 (Fast Fourier Transform coefficient at index 26) is the most dominant feature, with Delta Power and Standard Deviation also contributing significantly. Features like Skewness and Kurtosis have minimal impact.

Similarly, for Class 1, FFT_26 remains the top feature, but Delta Power shows increased relevance, narrowing its gap with FFT_26. Features such as Abs Diff Signal and Alpha Power gain importance compared to their influence in Class 0.

In Class 2, FFT_26 retains its lead but shows slightly reduced dominance compared to other classes. Features like Delta Power, Beta Power, and Abs Diff Signal play more prominent roles, while lower-ranked features like Max and Skewness remain minimally influential.

For Class 3, FFT_26 continues to lead, but its margin over Delta Power and Abs Diff Signal narrows further. Features such as Kurtosis and Peak-to-Peak show slight increases in importance, though they remain among the less significant features overall.

Analysis shows that FFT_26 is consistently the most critical feature across all classes, while the importance of other features like Delta Power, Standard Deviation, and Abs Diff Signal varies by class, reflecting distinct EEG patterns associated with each.

SHAP analysis identified FFT_26 as the most influential feature in classifying EEG-based wheelchair navigation commands, highlighting its critical role in BCIs. As a frequency-domain feature extracted using FFT, FFT_26 likely corresponds to motor-related EEG activity, particularly in the beta (13–30 Hz) or gamma (> 30 Hz) bands, which are strongly linked to motor imagery and movement planning. Its dominance in classification suggests that it effectively distinguishes between imagined forward, backward, left, and right movements, making it a key component in EEG-driven mobility control.

The identification of FFT_26 enhances FS efficiency, reducing model complexity while preserving accuracy. Additionally, refining signal preprocessing to amplify relevant frequency bands can improve classification performance. Understanding FFT_26’s role also supports personalized model adaptation, allowing for user-specific frequency optimization in BCI applications. These findings confirm the biological relevance and interpretability of EEG-based ML models, paving the way for more robust, efficient, and user-adaptive assistive mobility technologies.

Finaly this study systematically optimized the EEG-based wheelchair control system through several key strategies. Classifier performance was enhanced by benchmarking multiple ML models, where ensemble methods—particularly Extra Trees and LGBM—outperformed others in accuracy and AUC metrics. Feature subset selection was refined using a combination of CTs and RFE, identifying the most relevant features while reducing dimensionality and overfitting. Outlier utilization was explored innovatively; rather than being discarded, outliers were found to contribute critical information to class boundaries, thereby improving model robustness. Finally, model interpretability was achieved through XAI (SHAP), which highlighted FFT_26 as a consistently dominant feature across all classes. These optimizations collectively improved classification accuracy, interpretability, and the practical utility of EEG signals in real-time assistive applications.

Conclusion

This study systematically evaluated ML models for EEG-based wheelchair navigation, addressing key challenges such as high dimensionality, noise, class imbalance, and outlier influence. The results demonstrate that ensemble learning models outperform traditional classifiers, with Extra Trees achieving the highest accuracy (0.69) after FS.

A key insight emerged regarding the role of outliers in EEG classification. Unlike conventional approaches that remove outliers, the findings reveal that training on outlier-only datasets significantly improved accuracy (Extra Trees: 0.82, LGBM: 0.75, XGB: 0.72), whereas removing outliers reduced performance (Extra Trees: 0.53). This suggests that outliers contribute valuable class boundary information rather than merely adding noise.

Further validation using ROC and PR curve analysis confirmed that ensemble classifiers, particularly Extra Trees and LGBM, achieved high class separability (ROC AUC > 0.85). Among the movement commands, Class 3 (forward movement) was the easiest to classify (ROC AUC: 0.92, PR AUC: 0.82), while Class 2 (right movement) posed the greatest challenge due to significant feature overlap.

To enhance interpretability, XAI analysis via SHAP identified FFT_26 as the most influential feature, emphasizing the importance of frequency-domain analysis in EEG-based BCI. Other key contributors, such as Delta Power and Standard Deviation, further highlight the effectiveness of statistical and spectral FS in improving classification accuracy and model transparency.

Implications and future directions

These findings provide a robust framework for improving EEG-controlled assistive technologies, particularly in real-time wheelchair navigation systems. Future research should explore hybrid models that integrate deep learning with FS to enhance both predictive performance and interpretability. Additionally, given the demonstrated importance of outlier data, alternative outlier-handling techniques—such as adversarial learning and anomaly-based feature enhancements—should be further investigated to optimize EEG classification in BCI applications.

Author contributions

Amr M. Hamed: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data Curation, Writing—Original, Writing—Review and Editing, and Visualization. Abdel-Fattah Attia: Conceptualization, Supervision, and Review and Editing. Heba El-Behery: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data Curation, Writing—Original, Writing—Review and Editing, and Visualization.

Funding

Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).

Data availability

No datasets were generated or analysed during the current study.

Declarations

Competing interests

The authors declare no competing interests.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1. Rashid M, Sulaiman N, PP Abdul Majeed A, Musa RM, Ab. Nasir AF, Bari BS, Khatun S. Current Status, Challenges, and Possible Solutions of EEG-Based Brain-Computer Interface: a comprehensive review. Front Neurorobot. 2020;14:25. https://doi.org/10.3389/fnbot.2020.00025

2. Chaddad, A; Wu, Y; Kateb, R; Bouridane, A. Electroencephalography signal processing: a comprehensive review and analysis of methods and techniques. Sensors; 2023; 23, 14 6434. [DOI: https://dx.doi.org/10.3390/s23146434]

3. Fathima, S; Kore, SK. Formulation of the challenges in brain-computer interfaces as optimization problems—a review. Front Neurosci; 2020; 14, [DOI: https://dx.doi.org/10.3389/fnins.2020.546656] 546656.

4. Roy, Y; Banville, H; Albuquerque, I; Gramfort, A; Falk, TH; Faubert, J. Deep learning-based electroencephalography analysis: a systematic review. J Neural Eng; 2019; 16, 5 [DOI: https://dx.doi.org/10.1088/1741-2552/ab260c] 051001.

5. Rajpura, P; Cecotti, H; Kumar Meena, Y. Explainable artificial intelligence approaches for brain–computer interfaces: a review and design space. J Neural Eng; 2024; 21, 4 [DOI: https://dx.doi.org/10.1088/1741-2552/ad6593] 041003.

6. Lotte, F; Bougrain, L; Cichocki, A; Clerc, M; Congedo, M; Rakotomamonjy, A; Yger, F. A review of classification algorithms for EEG-based brain-computer interfaces: a 10 year update. J Neural Eng; 2018; 15, 3 [DOI: https://dx.doi.org/10.1088/1741-2552/aab2f2] 031005.

7. Ahmed M. Wheelchair EEG Signals. Kaggle. 2024. Retrieved from https://www.kaggle.com/datasets/mneebahmd/wheel-chair-eeg-signals/data. Licensed under Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

8. Hou, Y; Zhou, L; Jia, S; Lun, X. A novel approach of decoding EEG four-class motor imagery tasks via scout ESI and CNN. J Neural Eng; 2020; 17, 1 [DOI: https://dx.doi.org/10.1088/1741-2552/ab4af6] 016048.

9. Chaddad, A; Wu, Y; Kateb, R; Bouridane, A. Electroencephalography signal processing: a comprehensive review and analysis of methods and techniques. Sensors (Basel); 2023; 23, 14 6434. [DOI: https://dx.doi.org/10.3390/s23146434]

10. Shoorangiz R, Weddell SJ, Jones RD. EEG-based machine learning: theory and applications. In Thakor NV (Ed.), Handbook of Neuroengineering. 2023. p. 1–25. Springer. https://doi.org/10.1007/978-981-16-5540-1_70

11. Yan, K; Zhang, D. Feature selection and analysis on correlated gas sensor data with recursive feature elimination. Sens Actuators, B Chem; 2015; 212, pp. 353-363. [DOI: https://dx.doi.org/10.1016/j.snb.2015.02.025]

12. Singh, AK; Krishnan, S. Trends in EEG signal feature extraction applications. Front Artif Intell; 2022; 5, 1072801. [DOI: https://dx.doi.org/10.3389/frai.2022.1072801]

13. Islam, R; Andreev, AV; Shusharina, NN; Hramov, AE. Explainable machine learning methods for classification of brain states during visual perception. Mathematics; 2022; 10, 15 2819. [DOI: https://dx.doi.org/10.3390/math10152819]

14. Sadegh-Zadeh, SA; Sadeghzadeh, N; Soleimani, O; Shiry Ghidary, S; Movahedi, S; Mousavi, SY. Comparative analysis of dimensionality reduction techniques for EEG-based emotional state classification. Am J Neurodegenerat Dis; 2024; 13, 4 pp. 23-33. [DOI: https://dx.doi.org/10.62347/ZWRY8401]

15. Fdez, J; Guttenberg, N; Witkowski, O; Pasquali, A. Cross-Subject EEG-based emotion recognition through neural networks with stratified normalization. Front Neurosci; 2021; 15, [DOI: https://dx.doi.org/10.3389/fnins.2021.626277] 626277.

16. Kang, T; Chen, Y; Wallraven, C. I see artifacts: ICA-based EEG artifact removal does not improve deep network decoding across three BCI tasks. J Neural Eng; 2024; [DOI: https://dx.doi.org/10.1088/1741-2552/ad788e]

17. Pichandi, S; Balasubramanian, G; Chakrapani, V. Hybrid deep models for parallel feature extraction and enhanced emotion state classification. Sci Rep; 2024; 14, 24957. [DOI: https://dx.doi.org/10.1038/s41598-024-75850-y]

18. Kit, NK; Amin, HU; Ng, KH; Price, J; Subhani, AR. EEG feature extraction based on fast fourier transform and wavelet analysis for classification of mental stress levels using machine learning. Adv Sci Technol Eng Syst J; 2023; 8, 6 pp. 46-56. [DOI: https://dx.doi.org/10.25046/aj080606]

19. Tawsik Jawad, KM; Verma, A; Amsaad, F; Ashraf, L. A study on the application of explainable ai on ensemble models for predictive analysis of chronic kidney disease. IEEE Access; 2025; 13, pp. 23312-23330. [DOI: https://dx.doi.org/10.1109/ACCESS.2025.3535692]

20. Alalhareth, M; Hong, S-C. An improved mutual information feature selection technique for intrusion detection systems in the internet of medical things. Sensors; 2023; 23, 10 4971. [DOI: https://dx.doi.org/10.3390/s23104971]

21. Jain, P; Yedukondalu, J; Chhabra, H et al. EEG-based detection of cognitive load using VMD and LightGBM classifier. Int J Mach Learn Cybern; 2024; 15, pp. 4193-4210. [DOI: https://dx.doi.org/10.1007/s13042-024-02142-2]

22. Albaqami, H; Hassan, GM; Subasi, A; Datta, A. Automatic detection of abnormal EEG signals using wavelet feature extraction and gradient boosting decision tree. Biomed Signal Process Control; 2021; 70, [DOI: https://dx.doi.org/10.1016/j.bspc.2021.102957] 102957.

23. Echtioui, A; Zouch, W; Ghorbel, M. Merged CNNs for the classification of EEG motor imagery signals. Multimed Tools Appl; 2025; 84, pp. 373-395. [DOI: https://dx.doi.org/10.1007/s11042-024-18892-8]

24. Singh, K; Malhotra, J. Two-layer LSTM network-based prediction of epileptic seizures using EEG spectral features. Complex Intell Syst; 2022; 8, 3 pp. 2405-2418. [DOI: https://dx.doi.org/10.1007/s40747-021-00627-z]

25. Ciurea A, Manoila CP, Ionescu B. EEG classification using hybrid convolutional neural network with attention mechanism. In Costin HN, Magjarević R, Petroiu GG (Eds.) Advances in Digital Health and Medical Bioengineering: EHB 2023. Vol. 109. 2024, p. 1054–1060. Springer. https://doi.org/10.1007/978-3-031-62502-2_88.

26. Sylvester, S; Sagehorn, M; Gruber, T; Atzmueller, M; Schöne, B. SHAP value-based ERP analysis (SHERPA): Increasing the sensitivity of EEG signals with explainable AI methods. Behav Res Methods; 2024; 56, 2 pp. 6067-6081. [DOI: https://dx.doi.org/10.3758/s13428-023-02335-7]

27. Ieracitano, C; Mammone, N; Hussain, A et al. A novel explainable machine learning approach for EEG-based brain-computer interface systems. Neural Comput Appl; 2022; 34, 12 pp. 11347-11360. [DOI: https://dx.doi.org/10.1007/s00521-020-05624-w]

28. An S, Kang M, Kim S, Chikontwe P, Shen L, Park SH. Subject-adaptive transfer learning using resting state EEG signals for cross-subject EEG motor imagery classification. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2023. 2023. p. 643–652. Springer. https://doi.org/10.1007/978-3-031-72120-5_63.

29. Maswanganyi, RC; Tu, C; Owolawi, PA; Du, S. Single-source and multi-source cross-subject transfer based on domain adaptation algorithms for EEG Classification. Mathematics; 2025; 13, 5 802. [DOI: https://dx.doi.org/10.3390/math13050802]

30. Netzer, E; Frid, A; Feldman, D. Real-time EEG classification via coresets for BCI applications. Eng Appl Artif Intell; 2020; 89, [DOI: https://dx.doi.org/10.1016/j.engappai.2019.103455] 103455.

31. Pawan, R; Dhiman, R. Machine learning techniques for electroencephalogram-based brain-computer interface: a systematic literature review. Meas Sens; 2023; 28, [DOI: https://dx.doi.org/10.1016/j.measen.2023.100823] 100823.

32. Kabir, MH; Akhtar, NI; Tasnim, N; Miah, ASM; Lee, H-S; Jang, S-W; Shin, J. Exploring feature selection and classification techniques to improve the performance of an electroencephalography-based motor imagery brain–computer interface system. Sensors; 2024; 24, 15 4989. [DOI: https://dx.doi.org/10.3390/s24154989]

33. Kazdaghli, S; Kerenidis, I; Kieckbusch, J; Teare, P. Improved clinical data imputation via classical and quantum determinantal point processes. Elife; 2024; 12, RP89947. [DOI: https://dx.doi.org/10.7554/eLife.89947.3]

34. Casella, M; Milano, N; Dolce, P; Marocco, D. Transformers deep learning models for missing data imputation: an application of the ReMasker model on a psychometric scale. Front Psychol; 2024; 15, 1449272. [DOI: https://dx.doi.org/10.3389/fpsyg.2024.1449272]

35. Wang, X; Ren, Y; Luo, Z; He, W; Hong, J; Huang, Y. Deep learning-based EEG emotion recognition: current trends and future perspectives. Front Psychol; 2023; 14, 1126994. [DOI: https://dx.doi.org/10.3389/fpsyg.2023.1126994]

36. Al-Hamadani, AA; Mohammed, MJ; Tariq, SM. Normalized deep learning algorithms based information aggregation functions to classify motor imagery EEG signal. Neural Comput Appl; 2023; 35, pp. 22725-22736. [DOI: https://dx.doi.org/10.1007/s00521-023-08944-9]

37. Shrestha, N. Detecting multicollinearity in regression analysis. Am J Appl Mathe Statist; 2020; 8, 2 pp. 39-42. [DOI: https://dx.doi.org/10.12691/ajams-8-2-1]

38. Liu Y, Yu T, Zhou Y, Wang L, Wang Z, Chen X. Interpretable and robust AI in EEG systems: a survey. 2023. arXiv preprint arXiv:2304.10755. https://arxiv.org/pdf/2304.10755

39. Alsuradi H, Park W, Eid M. Explainable classification of EEG data for an active touch task using Shapley values. In: Stephanidis C, Kurosu M, Degen H, Reinerman-Jones L (Eds.) HCI International 2020-late breaking papers: multimodality and intelligence (Vol. 12424. 2020. p. 384–396). Springer. https://doi.org/10.1007/978-3-030-60117-1_30

40. Rainio O, Teuho J, Klén R. Evaluation metrics and statistical tests for machine learning. Sci Rep. 2024;14: Article 6086. https://doi.org/10.1038/s41598-024-56706-x.

41. Röttger, P; Pavlopoulos, J; Sorensen, J; Dixon, L; Thain, N; Vlachos, A. A closer look at classification evaluation metrics and a critical reflection of common evaluation practice. Trans Assoc Comput Linguist; 2023; 11, pp. 221-238. [DOI: https://dx.doi.org/10.1162/tacl_a_00675]

42. Jolliffe, IT; Cadima, J. Principal component analysis: a review and recent developments. Philosoph Trans R Soc A Mathe Phys Eng Sci; 2016; 374, 2065 20150202.3479904 [DOI: https://dx.doi.org/10.1098/rsta.2015.0202]

43. Svantesson, M; Olausson, H; Eklund, A; Thordstein, M. Get a new perspective on EEG: convolutional neural network encoders for parametric t-SNE. Brain Sci; 2023; 13, 3 453. [DOI: https://dx.doi.org/10.3390/brainsci13030453]

44. Goodfellow IJ, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. 2014. arXiv preprint arXiv:1412.6572. https://doi.org/10.48550/arXiv.1412.6572.

45. Hendrycks D, Dietterich TG. Benchmarking neural network robustness to common corruptions and perturbations. 2019. arXiv preprint arXiv:1903.12261. https://doi.org/10.48550/arXiv.1903.12261.

Word count: 8895

Show less

© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Optimization of EEG-based wheelchair control: machine learning, feature selection, outlier management, and explainable AI

Content area

Abstract

Full text