Content area
As the global population ages, Alzheimer’s disease (AD) poses a significant worldwide challenge as a leading cause of dementia, with a slow early progression that eventually leads to nerve cell death and currently lacks effective treatment. However, early diagnosis can slow its progression through pharmaceutical intervention, making accurate early diagnosis using computer-aided diagnosis (CAD) systems crucial. This study aims to enhance the accuracy of early AD diagnosis by developing an improved optimization approach for deep learning-based CAD systems. To achieve this, this paper proposes an improved Harris Hawks optimization algorithm (HHO), named CAHHO, which incorporates crisscross search and adaptive β-Hill climbing mechanisms, thereby enhancing population diversity and search space coverage during the exploration phase, while adaptively adjusting the step size during the exploitation phase to improve local search precision. Comparative experiments with classical algorithms, HHO variants, and advanced optimization methods validate the superiority of the proposed CAHHO. Specifically, this study employs the deep learning model residual network with 18 layers (ResNet18) as the base model for AD diagnosis and uses CAHHO to optimize key hyperparameters, including the number of channels and learning rate. Experiments on the AD neuroimaging initiative dataset demonstrate that the ResNet18-CAHHO model outperforms existing methods in classifying AD, mild cognitive impairment (MCI), and normal control (NC) subjects. Specifically, it achieves accuracies of 0.93077, 0.80102, and 0.80513 in the diagnosis of AD versus NC, MCI versus NC, and AD versus MCI, respectively. Furthermore, Gradient-Weighted Class Activation Mapping (Grad-CAM) visualizations reveal critical brain regions associated with AD, providing valuable diagnostic support for clinicians and holding significant promise for early intervention.
Introduction
Alzheimer’s disease (AD) is the leading cause of dementia, affecting a significant portion of the global population and accounting for approximately 60% to 80% of all dementia cases (Sheng et al. 2024). According to AD international (ADI), over 47 million people are currently living with AD worldwide (Ricci 2019), a number expected to rise dramatically to 152 million by 2050, equating to a new case of dementia every 33 s (Nanni et al. 2019). As a chronic neurodegenerative disease, AD progresses slowly in its early stages but gradually worsens over time, ultimately leading to the death of all nerve cells in the brain (McKhann et al. 1984). AD progresses through three stages: pre-symptomatic AD, where pathological changes occur without noticeable symptoms; mild cognitive impairment (MCI), characterized by noticeable cognitive changes that do not significantly interfere with daily life; and full-blown AD, marked by severe cognitive and functional impairments, requiring extensive care as the disease progresses and ultimately leading to the death of nerve cells. Currently, there is no effective treatment for AD, and the cost of care increases dramatically as symptoms worsen. Nevertheless, if AD can be diagnosed at an early stage, pharmacological interventions can slow its progression. Therefore, it is crucial to employ computer-aided diagnostic (CAD) systems for accurate early-stage diagnosis, as these can facilitate early detection and play a vital role in managing the disease and potentially improving patient outcomes.
In the early stages, the diagnosis of AD primarily relies on clinical observation, which is highly subjective and heavily dependent on the clinician’s experience. Furthermore, AD patients are often diagnosed when the condition has already progressed, missing the optimal window for initiating treatment to delay its progression. With the advancement of CAD technology, magnetic resonance imaging (MRI), which offers non-invasive, non-destructive imaging with rich information content, holds promise in elucidating the brain pathology underlying AD (Hsu et al. 2023). In past studies, traditional machine learning (ML) methods, such as random forest (RF), support vector machine (SVM), and eXtreme gradient boosting (XGBoost) (García-Gutiérrez et al. 2024), have played a significant role in the analysis and application of MRI images. Additionally, various manually extracted regions of interest (ROI) from MRI are often utilized for the early diagnosis of AD (Zhang et al. 2023). Vaithinathan and Parthiban (2019) utilized the rough ROI (RROI) technique to extract features from specified ROIs, employed high-dimensional feature selection techniques and applied various ML techniques, which allowed them to distinguish between AD, MCI, and normal control (NC) groups. Syaifullah et al. (2021) created two types of SVMs for AD diagnosis, one based on brain structure (SVMst) and the other based on both brain structure and mini-mental state examination scores (SVMcog). Although MRI imaging is predominantly used for AD diagnosis, involving classification and feature extraction through ML methods that yield promising results, a drawback of traditional ML techniques is their reliance on manual feature design and extraction (Wang 2020). This becomes particularly challenging when dealing with large datasets and complex multi-feature models. To address these challenges, recent studies have explored the integration of machine learning with optimization techniques. Research (Yassen et al. 2024) emphasizes the importance of combining machine learning models with optimization strategies to enhance predictive accuracy. Furthermore, research (El-Kenawy et al. 2024) highlights the potential applications of machine learning in public health, demonstrating its versatility beyond medical diagnostics.
Deep learning (DL), which is one of the pivotal methods in the field of ML, reduces the incompleteness caused by manual feature engineering and demonstrates strong transfer learning capabilities compared to traditional ML approaches. The residual learning concept proposed by He et al. (2016) addresses the vanishing gradient problem associated with increasing network depth, achieving tremendous success in computer vision and being widely applied in tasks such as image classification, object detection, and image segmentation. Following the remarkable success of DL in computer vision (Hassan et al. 2022), this technology has naturally extended into medical image analysis (Hassan et al. 2024). In this domain, leveraging neuroimaging data such as MRI, and DL techniques like convolutional neural networks (CNNs) have achieved significant advancements in assisting the early diagnosis of AD, leading to numerous emerging studies in this area (Wen et al. 2020). Xu et al. (2023) and Logan et al. (2021) both highlight the effectiveness of 3D CNNs in this context, with Logan et al. (2021) specifically discussing the use of ensemble learning and generative adversarial networks to enhance the robustness of these models. Basheera and Sai Ram (2019) proposed a method for extracting gray matter from brain voxels and classifying it using a CNN, achieving a clinical evaluation accuracy of 90.47%. Choi et al. (2020), who sliced and registered patient MRI images followed by ROI segmentation, used the processed data to employ a 2D-CNN for classifying three groups: AD/NC, AD/MCI, and MCI/NC, achieving prediction accuracies of 92.3%, 85.6%, and 78.1%, respectively. Recent studies (Bayram et al. 2025; Karaman et al. 2023a, 2023b; Pacal 2024; Pacal et al. 2024, 2025) have further explored the optimization of DL models for medical image analysis through hyperparameter tuning, attention mechanisms, and architecture improvements. These works collectively demonstrate that effective hyperparameter optimization and architectural enhancements can significantly boost the accuracy and robustness of DL models in medical imaging applications.
CNNs require the optimization of numerous hyper-parameters during network training, a task that is both complex and challenging when performed manually (Darwish et al. 2020; Fetanat et al. 2021; Wang et al. 2023). Moreover, these hyper-parameters are often data-dependent, rendering them potentially unsuitable for other datasets. The difficulty in obtaining appropriate values for CNN hyper-parameters stems from the lack of a robust mathematical approach, necessitating numerous iterations to achieve optimal performance. While random and grid searches can automate the determination of hyper-parameters, these methods are notably time-consuming (Darwish et al. 2020). Additionally, Bayesian optimization methods, although capable of identifying hyper-parameters, require the estimation of several error function statistics, which can lead to inefficient outcomes (Nazir et al. 2020). Therefore, an automatic method based on metaheuristic algorithms for optimizing hyper-parameter values could significantly reduce computational costs and enhance performance (Bochinski et al. 2017). Furthermore, these metaheuristic algorithms are adept at handling non-continuous and non-differentiable problems (Heidari et al. 2019a). In this context, it is worth noting that previous research (Alnowaiser et al. 2024; Jovanovic et al. 2024) has explored the potential of using CNNs combined with optimization algorithms for disease diagnosis.
Numerous metaheuristic algorithms exist, including particle swarm optimization (PSO) (Kennedy and Eberhart 1995), differential evolution (DE) (Storn and Price 1997), Harris Hawks optimization (HHO) (Heidari et al. 2019b), Slime Mould Algorithm (SMA) (Hu et al. 2022), Salp Swarm Algorithm (SSA) (Mirjalili et al. 2017), Grey Wolf Optimizer (GWO) (Mirjalili et al. 2014a) and Greylag Goose optimization (GGO) (El-kenawy et al. 2024). According to the “no free lunch (NFL)” theorem (Wolpert and Macready 1997), which asserts that no single metaheuristic algorithm is optimal for solving every type of optimization problem, it is often necessary to tailor or enhance metaheuristic algorithms for specific problems. The HHO algorithm, recently introduced by Heidari et al. (2019b) and inspired by the chasing and escaping behavior between Harris hawks and rabbits, has demonstrated superior performance compared to several other metaheuristic algorithms. Although HHO demonstrates excellent performance in certain optimization problems, it also has several drawbacks, including premature convergence, insufficient search balance, slow convergence speed, and limited applicability. Consequently, these issues cause HHO to perform poorly in some complex optimization tasks, which has prompted us to attempt improvements to the well-known HHO to enhance its performance and applicability. Therefore, this paper innovatively proposes an improved HHO algorithm, named CAHHO, which combines crisscross search and adaptive β-Hill climbing (AβHC) mechanisms. During the exploration phase, it increases population diversity and search space coverage, while in the exploitation phase, it adaptively adjusts the step size to enhance local search precision and quality, thus reducing the risk of local optima entrapment. Compared to the original HHO, CAHHO demonstrates significant improvements in global and local search efficiency, convergence speed, solution quality, and algorithm stability. Additionally, comparative experiments with classical algorithms, HHO variants, and advanced optimization algorithms also validate the superiority of the proposed algorithm.
In addition, Residual Network with 18 layers (ResNet18), chosen as the foundational model for this study due to its status as a shallower variant within the ResNet series, offers significant advantages in terms of computational complexity and resource consumption compared to deeper models such as ResNet50 or ResNet101. It maintains high classification performance while reducing computational overhead and training time. Furthermore, ResNet18 introduces shortcut connections, which effectively address common issues in training deep neural networks, such as vanishing and exploding gradients. These shortcut connections facilitate more efficient information flow between network layers, thereby enhancing the model’s convergence and stability. Given its relatively simple network architecture and reliable performance, ResNet18 is well-suited for further hyper-parameters optimization and adjustment. In this study, we introduce the novel ResNet18-CAHHO model, which employs the CAHHO algorithm to optimize the number of channels and the learning rate, enabling ResNet18 to fully realize its potential and improve its performance in specific auxiliary diagnostic tasks for AD.
The primary contributions of this paper are summarized as follows:
An improved CAHHO algorithm is proposed, which significantly enhances the global search capability and convergence speed when solving complex optimization problems by incorporating the crisscross search and AβHC mechanisms.
CAHHO is applied to the hyperparameter optimization of the ResNet18, specifically for AD diagnostic tasks, achieving automatic adjustment of model channel numbers and learning rates, thereby improving the diagnostic accuracy and efficiency of the model.
Utilizing the optimized model combined with MRI image analysis, a high-accuracy diagnosis of early-stage AD is achieved. This is of great clinical significance for early intervention and treatment.
Comparative experiments validate the superior performance of the proposed ResNet18-CAHHO model in various AD diagnostic tasks.
The structure of the paper is outlined as follows. Section 2 discusses the ResNet18 network model and the original HHO. The proposed algorithm is introduced in Sect. 3. Section 4 details the experimental results of the proposed CAHHO algorithm. Section 5 provides descriptions of the data utilized in the study and outlines the ResNet18-CAHHO model for AD diagnosis. Section 6 elaborates on the AD diagnostic experiments and the corresponding results. Section 7 presents an in-depth discussion of the findings from Sect. 6. Lastly, Sect. 8 offers conclusions and suggests directions for future research.
Background
ResNet18 network model
He et al. (2016) proposed the ResNet residual network model, based on the idea of shortcut connection, which can solve the problem of gradient explosion and network degradation of traditional neural networks. The network solves the problem of vanishing gradient by applying the constant mapping function to the deep network, which makes the model degenerate into a shallow network, thus solving the problem of vanishing gradient. The constant mapping function formula is as follows:
1
where x is the input, F(x) is the output obtained after x has gone through two convolutional layers, and H(x) denotes the residual module in the residual network. Derivation of Eq. (1) leads to the following formula:2
From the above formula, it can be seen that regardless of how small F'(x) is, the total gradient value H'(x) will always be at least 1. This effectively solves the problem of gradient vanishing caused by increasing the depth of the network. The structure of the residual module is shown in Fig. 1.
[See PDF for image]
Fig. 1
The structure of the residual module
By adding the input x to F(x), the final output of the residual module, H(x), is obtained. Additionally, an identity function, identity(x), is included. The identity function performs a 1*1 convolution operation, primarily serving to adjust the number of input channels. Due to its relatively small number of parameters and powerful feature extraction capabilities, the ResNet18 network has been widely applied across various fields. Therefore, this study utilizes the ResNet18 network for AD diagnosis. The structure of the ResNet18 network, illustrated in Fig. 2, features solid lines indicating that the number of channels remains unchanged within the residual blocks, while dashed lines denote a doubling of the channel count. As shown in the figure, this network comprises five convolutional layers, a batch normalization (BN), a rectified linear unit (ReLU), a max-pooling layer, an average pooling layer, a fully connected layer, and a softmax classification function.
[See PDF for image]
Fig. 2
The structure of the ResNet18 network
The loss function serves as a metric to gauge the disparity between the output values and the ground truth during the training process of neural networks, thereby guiding subsequent training steps in the correct direction to aid in learning and performance improvement. The larger the difference between the output values and the true labels, the larger the loss value, and the magnitude of the loss determines the effectiveness of the network in processing data. A smaller loss value indicates better performance of the neural network. Therefore, neural network training involves the continual reduction of the loss function value. In this case, the cross-entropy loss function is applied to the ResNet18 neural network, with the computational formula outlined as follows:
3
where y represents the true values of the training samples, f(x) denotes the predicted values of the samples during the training process, and L(y, (f(x)) signifies the loss function. The central concept of this function is to quantify the divergence between the true probability distribution and the predicted probability distribution.Harris Hawks optimization algorithm (HHO)
The HHO (Heidari et al. 2019b) simulates the cooperative hunting behavior of the Harris hawk population using different strategies for searching for and feeding on prey, consisting of an exploratory phase and an exploitation phase, with transitions between the two phases based on changes in prey escape energy.
Exploration phase
The positions of Harris hawks are dispersed at random, and they use two methods to find prey:
4
where Xrabbit(t) is the ideal individual position (the location of the prey at the t-th iteration), and X(t + 1) is the hawk’s position during the (t + 1)-th iterations. The position of the eagle in the current iteration is denoted by X(t), and the random integers in the range of 0 to 1 are r1, r2, r3, r4, and r. LB and UB stand for the lower and upper bounds of the search, respectively. The random individual’s position in the current iteration is shown by Xrand(t), and Xm(t) represents the mean position of all individuals at iteration t, which is defined as:5
where N represents the total number of individuals in the population, and Xi denotes the position of the i-th individual at the current iteration.Transition stage from exploration to exploitation
The transition of Harris hawks from exploration to exploitation is determined by the escape energy of their prey. The prey’s escape energy E is defined as:
6
where T is the maximum number of iterations and E0 denotes the initial energy. The dynamic escape energy E decreases during the iteration process. When |E|≥ 1, the algorithm enters the exploration phase; when |E|< 1, it enters the exploitation phase.Exploitation stage
Assuming that r is the prey escape probability, depending on the energy of the prey’s escape probability, the Harris hawks form the corresponding four raid strategies.
Strategy 1: Soft siege. When r ≥ 0.5 and |E|≥ 0.5, the prey has enough energy. Harris hawks adopt a soft encirclement strategy to deplete the prey’s energy and conduct a surprise attack. The formula is as follows:
7
where the jumping distance of the rabbit during its escape is denoted by J:8
Strategy 2: Hard siege. When r ≥ 0.5 and |E|< 0.5, it indicates that the prey is low on energy and has little chance of escape, so the Harris hawks make a direct raid. The formula is as follows:
9
Strategy 3: Soft siege with progressive fast swooping. This strategy is employed when r < 0.5 and |E|> 0.5, indicating that the prey is energetic and likely to attempt an escape. The Harris hawks adopt a soft encirclement strategy, specifically implemented through the following methods.
10
If there is no improvement in the fitness value, another strategy is implemented:
11
where D denotes the spatial dimension, S is a 1 × D random vector, and LF represents the Lévy flight function:12
where β is taken as 1.5 and u and v are random variables in the range (0, 1).In summary, the updated strategy for this phase is summarized as follows:
13
where F(⋅) represents the fitness function.Strategy 4: Hard siege with progressive fast swooping. When r < 0.5 and |E|< 0.5, although the prey is low in energy, it has a higher likelihood of escaping. In this scenario, the Harris hawks initially form an encirclement and then execute a surprise attack. If the surprise attack is unsuccessful, an alternative strategy is subsequently employed. The formula is the same as Eq. (13), where Y takes the following values:
14
Materials and methods
Crisscross search mechanism
There are two further subcategories of the crisscross search method: horizontal crossover search (HCS) and vertical crossover search (VCS) (Meng et al. 2014). By employing both operations simultaneously, a variety of search patterns are generated, whose primary function is to avoid local optima, which accelerates convergence and substantially reduces error values. During each iteration, these two operators execute distinct crossover operations to produce new candidate solutions. To ensure the retention of the highest-performing individuals, a greedy selection mechanism is applied, consistently maintaining the best solutions within the population.
The HCS involves the positional updating of two distinct search agents, facilitating the exchange of information and mutual learning. The search agents’ exploration abilities are greatly improved by this interaction, which quickens the algorithm’s convergence. The parent search agents’ jth position vectors Xi1 and Xi2 are assumed to undergo HCS, as mathematically described by Eqs. (15) and (16).
15
16
where r1 and r2 are randomly generated values within the range [0, 1]; c1 and c2 are random values within the range [− 1, 1]; and and represent the updated positions of Xi1 and Xi2, respectively. Here, the HCS mechanism facilitates the exchange of information between two solutions by utilizing the randomly generated values r1 and r2, thereby enhancing the exploration of the search space and effectively preventing the algorithm from prematurely converging to a local optimum.In later iterations, a search agent might fail to identify an improved position within a given dimension, potentially causing the algorithm to become trapped in a local optimum. To address this issue, the VCS mechanism updates solutions across different dimensions, effectively preventing premature convergence to local optima. This approach is particularly beneficial in later iterations, ensuring broad search space coverage. The jth position of Xi undergoing VCS can be represented by Eq. (17).
17
where r denotes a randomly generated value from the interval [0, 1], and corresponds to the j1th dimension of Xi.Adaptive β-hill climbing (AβHC) mechanism
The adaptive β-hill climbing algorithm (Al-Betar et al. 2019), an enhancement of the β hill-climbing algorithm (Al-Betar 2017), was introduced by Al-Betar et al. in 2019. The algorithm is defined below, with the optimization problem formulated mathematically as follows:
18
where represents a set of feasible solutions within the domain X. Each xi lies within the interval [LBi, UBi], where LBi and UBi are the respective lower and upper bounds of the solutions. The function h(x) serves as the objective function, and N denotes the population size.The initial provisional solution for is established at the outset. Throughout each iteration, this solution undergoes refinement via the application of two distinct operators: the η-operator and the β-operator. The η-operator is pivotal in the exploitation phase, facilitating neighborhood search through random walks. Conversely, the β-operator, analogous to a uniform mutation operator, is crucial in the exploration phase, enabling broader search capabilities.
The η-operator utilizes the concept of “random walk” to enhance the solution. This process is mathematically represented by Eq. (19):
19
In this framework, the current solution generates a neighboring solution through Eq. (19). Here, U(0, 1) represents a random variable uniformly distributed between 0 and 1, and the parameter η dictates the distance between and . Within the AβHC algorithm, η functions as an adaptive coefficient. The magnitude of η determines the range of random search, thereby influencing the ability to escape from local optima. A higher η value broadens the search range, facilitating exploration. To balance the exploration and exploitation phases effectively, η is initially set to a larger value to allow for extensive search in the early iterations and is then progressively decreased to 0, as described by Eq. (20), promoting convergence toward the optimal solution.
20
where t and T denote the current iteration count and the maximum number of iterations, respectively. The constant p adjusts the rate at which η approaches 0 as iterations progress. In this study, p is set to 2, ensuring that η decreases gradually, thus facilitating a controlled transition from exploration to exploitation.The solution , generated through the η-operator update, serves as the current solution. Subsequently, the β-operator is applied to to produce an updated position , as described below:
21
where r represents a random variable uniformly distributed between 0 and 1, and k is a random index selected from the range of i. According to Eq. (21), if r is less than or equal to β, a randomly chosen individual from the current population replaces the current individual. Otherwise, the individual remains unchanged. The parameter β is calculated using a linearly increasing formula that depends on βmax and βmin. In the original study, the values of βmin and βmax were set to 0.01 and 0.1, respectively. The formula is given by:22
This adaptive strategy allows the algorithm to start with larger steps for broad exploration and gradually refine them for fine-grained local search, effectively preventing premature convergence.
The proposed CAHHO
This paper introduces an enhanced version of the HHO algorithm, named CAHHO, by integrating crisscross search and AβHC techniques, aiming to address the challenges of local optima entrapment and convergence efficiency. During the exploration phase, the crisscross search mechanism employs HCS and VCS strategies to increase population diversity and extend search space coverage, thereby bolstering global search capabilities while maintaining algorithmic stability across different runs. In the exploitation phase, AβHC refines local search by adaptively adjusting the step size, thus enhancing solution precision and quality. This technique mitigates the risk of local optima entrapment and accelerates convergence in later iterations. Consequently, the enhanced HHO algorithm, leveraging the strengths of crisscross search and AβHC, demonstrates substantial improvements in global and local search efficiency, convergence speed, solution quality, and algorithm stability. These enhancements make the proposed HHO variant a robust and versatile tool for tackling complex optimization problems. The detailed framework of the proposed CAHHO is illustrated in Fig. 3.
[See PDF for image]
Fig. 3
The detailed framework of the proposed CAHHO
Results
Experimental design description
In this study, the optimization capability of CAHHO was initially validated at IEEE CEC 2017. The performance of optimization algorithms is frequently assessed in the literature using the IEEE CEC 2017 dataset. It consists of thirty test functions, which are divided into four categories: composite functions (F21–F30), hybrid functions (F11–F20), multimodal functions (F4–F10), and unimodal functions (F1–F3). Detailed information regarding these test functions can be found in the references (LaTorre and Pena 2017). Unimodal functions are appropriate for evaluating the proposed method's exploitation capacity because they have a single global optimum. Multiple local optima are present in multimodal functions, making them perfect for assessing the exploratory power of the suggested approach. Hybrid and composite functions present larger problem scales, serving to balance the exploitation and exploratory aspects of the proposed algorithm. Therefore, employing the IEEE CEC 2017 function set allows for a comprehensive validation of the performance of the proposed algorithm. To minimize randomness in experiments, all involved algorithms are compared under identical conditions. In particular, 30 is the value of the population size N, 30 is the value of the data dimension D, and 300,000 is the value of the maximum number of evaluations (MaxFEs). It is important to note that all algorithms undergo 30 independent tests in order to fully reduce random mistakes. A number of studies were carried out in order to thoroughly assess CAHHO’s optimization performance. Firstly, the impacts of the crisscross search and AβHC strategies on HHO were separately tested on the 30 IEEE CEC 2017 benchmark functions. Subsequently, CAHHO was compared with 7 classical algorithms, 7 other variants of HHO, and 9 advanced algorithms to validate the effectiveness of the CAHHO optimizer. The average (Avg) and standard deviation (Std) are used to analyze the experimental findings in order to accurately determine how well the tested optimizers performed. The best findings are bolded in the experimental data of the tests. Next, the statistical significance of the revised algorithms is evaluated using the Wilcoxon signed ranks test (WSRT) (Derrac et al. 2011). The test’s significance level is set at 0.05. A p-value less than 0.05 indicates that CAHHO has a significant advantage over the comparative methods. Conversely, CAHHO’s optimization performance is either equal to or lower than that of the comparative algorithms. Furthermore, the symbols “R + /R = /R−” are used to indicate whether CAHHO is superior to, equal to, or inferior to other optimizers. Lastly, the statistical analysis was conducted using the Friedman test (FT) results (Derrac et al. 2011), utilizing the average rank value (ARV) to examine the average performance of each tested algorithm in more detail, thereby reaffirming the optimization capability of the proposed CAHHO.
Benchmark function validation
In this section, the superiority of the proposed CAHHO method is validated through ablation experiments, as well as comparison experiments with classical algorithms, HHO variants, and advanced algorithms.
Impact of two mechanisms on HHO
To evaluate the individual and combined effects of the crisscross search and AβHC mechanisms on HHO, we designed three different variants of HHO, namely CHHO, AHHO, and CAHHO. This subsection presents comparative experiments between HHO and the three variants, as outlined in Table 1. In the table, “C” denotes the crisscross search mechanism, and “A” denotes the AβHC mechanism, with “1” indicating that the variant incorporates the corresponding strategy and “0” indicating its absence. For example, CHHO signifies the combination of only the C strategy with HHO, without introducing the A strategy.
Table 1. Various HHO variants with two strategies
C | A | |
|---|---|---|
HHO | 0 | 0 |
CHHO | 1 | 0 |
AHHO | 0 | 1 |
CAHHO | 1 | 1 |
To more accurately assess the effectiveness of each strategy, the WSRT was employed to assess the advantages and disadvantages of CAHHO relative to other algorithms across the 30 functions. Additionally, the FT was used to rank the four involved algorithms. Table 2 displays the results of the WSRT and the FT. In Table 2, from the column labeled “R + /R−/R = ”, it can be observed that CAHHO outperforms AHHO and HHO across all functions and surpasses CHHO on 5 functions. Specifically, CAHHO exhibits an ARV of 1.433333, lower than the other algorithms, notably demonstrating significant advantages over HHO. This pronounced improvement highlights that the integration of the crisscross search and AβHC strategies can substantially ameliorate the shortcomings of the original HHO.
Table 2. Comparison of results for HHO and three HHO variants
Algorithms | Ranking | ARV | R + /R−/R = |
|---|---|---|---|
CAHHO | 1 | 1.433333 | ~ |
CHHO | 2 | 1.566667 | 5/0/25 |
AHHO | 3 | 3.433333 | 30/0/0 |
HHO | 4 | 3.566667 | 30/0/0 |
Figure 4 illustrates the curves of convergence and box plots of CAHHO, CHHO, AHHO, and HHO on selected functions. Unimodal functions contain only one global optimum, effectively testing the exploitation capability of algorithms. From Fig. 4, it can be observed that CAHHO achieves the best results on F1 and F3, indicating a significant enhancement in the exploitation capability of HHO through the integration of the crisscross search and AβHC strategies. In the multimodal function F9, CAHHO also attains the optimal value, suggesting that the integration of the two strategies aids HHO in escaping local optima and obtaining higher-quality solutions. Furthermore, in hybrid and composite functions, CAHHO consistently demonstrates superior convergence, such as in F12, F14, F18, F19, F20, F21, and F26. This further underscores the efficacy of the two strategies in enhancing the overall performance of HHO. In conclusion, based on the experimental results, CAHHO exhibits outstanding performance across various scenarios, emerging as the best choice among HHO variants.
[See PDF for image]
Fig. 4
Convergence curves and box plots of CAHHO and other algorithms
To showcase the performance and stability of each algorithm, a radar chart was plotted in Fig. 5. This radar chart further illustrates the performance and stability of each algorithm across 30 test functions. A function curve that converges closer to the origin indicates better performance, while a smoother function curve indicates better stability. In the radar chart, the red curve represents CAHHO, and the orange curve represents HHO. The red curve converges the closest to the origin among all curves, whereas the orange curve is the furthest from the origin, indicating that CAHHO has the best performance while HHO has the worst. In terms of stability, there are varying degrees of fluctuations across different functions. Although CAHHO exhibits larger fluctuations in some functions, its overall range of fluctuations is small, demonstrating strong stability. CHHO shows performance similar to that of CAHHO, oscillating near the origin, which indicates that the C strategy significantly enhances HHO. Furthermore, CAHHO’s leading advantage over CHHO emphasizes that the integration of crisscross search and AβHC strategies greatly improves the performance of HHO.
[See PDF for image]
Fig. 5
Radar chart of CAHHO and other algorithms
Comparison experiments with classical algorithms
In this section, CAHHO is compared with 7 classical algorithms: HHO (Heidari et al. 2019b), PSO (Kennedy and Eberhart 1995), Bat Algorithm (BA) (Mirjalili et al. 2014b), Whale Optimization Algorithm (WOA) (Mirjalili and Lewis 2016), Sine Cosine Algorithm (SCA) (Mirjalili 2016), Moth-Flame Optimization (MFO) (Mirjalili 2015), and Firefly Algorithm (FA) (Yang 2017), to validate its superior performance.
Table 6 displays the results of the tests conducted on these eight algorithms using the FT and WSRT. The row labeled “R + /R−/R = ” presents the results of the WSRT. It is evident that compared to the HHO, CAHHO outperforms it on 22 functions and achieves similar results on 7 functions, indicating a significant improvement over HHO. The results also intuitively demonstrate that, compared to other excellent classical algorithms, CAHHO still maintains good performance. Moreover, CAHHO exhibits an ARV of 1.6694, indicating that CAHHO achieves first place in nearly all functions, significantly lower than the other algorithms.
In order to display the results more vividly, Fig. 6 shows the FT results of CAHHO and seven other classical algorithms. From the figure, it is evident that the ARV value of CAHHO is significantly lower than that of HHO and other classical algorithms, indicating that CAHHO outperforms these classical algorithms. Additionally, an ARV value of 1.67 means that CAHHO consistently ranks among the top performers, further underscoring its superior performance.
[See PDF for image]
Fig. 6
Result of FT for CAHHO and other classical algorithms
Table 7 provides a detailed comparison of CAHHO with the other seven classical algorithms on the IEEE CEC 2017 benchmark functions. The table clearly displays the Avg and Std values of the 8 algorithms on each function, where smaller Avg values imply that the algorithm can explore better solutions on such functions, and lower Std values indicate better stability. According to the Avg values in the table, CAHHO achieves the lowest Avg value in 22 functions, securing the first position, second in 4 functions, and third in the remaining 3 functions. Additionally, CAHHO consistently maintains lower Std values, indicating that compared to other classical algorithms, it possesses superior optimization performance and stability.
Figure 7 displays the convergence curves and box plots of CAHHO compared to the other seven classical algorithms on selected functions. From the figure, it is evident that CAHHO achieves higher-quality solutions with a noticeable advantage across various types of functions. Whether in unimodal functions like F1, more complex multimodal functions like F5, F6, F9, and F10, hybrid functions like F14, F15, and F19, or composite functions like F20 and F22, CAHHO consistently outperforms other algorithms. CAHHO excels not only in exploring unimodal functions with a unique optimal solution but also in scenarios with multiple local optima, where it can break out of local optima with its superior exploratory ability when other algorithms get stuck. This highlights the excellent performance of CAHHO. Moreover, the stability of CAHHO is evident from the box plots, where the box representing CAHHO consistently exhibits the smallest and lowest values, indicating its robust stability.
[See PDF for image]
Fig. 7
Convergence curves and box plots of CAHHO and other classical algorithms
Comparison experiments with HHO variants
To further validate the optimization performance of the proposed CAHHO, comparisons were made with 7 other variants of HHO, including SSFSHHO (Zhang et al. 2023), CMHHO (Shan et al. 2023), NCHHO (Dehkordi et al. 2021), CTHHO (Wang et al. 2022), GCHHO (Song et al. 2021), GSHHO (Zhang et al. 2022), and DHHOM (Jia et al. 2019). Experiments were conducted on the IEEE CEC 2017 benchmark functions across four different problem dimensions (10, 30, 50 and 100).
Table 8 presents the results obtained after conducting the FT on the experimental results. From the table, it is evident that CAHHO consistently maintains a significant lead, with its performance improving as the dimensions increase. At dimension 10, CAHHO achieves first place with an ARV of 2.4344. At dimension 30, CAHHO’s performance improves, securing first place with an ARV of 1.9578. At dimension 50, CAHHO’s performance further improves, maintaining first place with an ARV of 1.8400. Finally, at dimension 100, CAHHO maintains its lead with a remarkable ARV of 1.6656. These results indicate that our proposed CAHHO exhibits superior performance in higher dimensions. Based on the experiments, CAHHO performs better as the dimensionality increases, demonstrating its strong stability and robustness.
Table 3. Description of the participant information (mean ± standard deviation)
Diagnosis | Number | Age | Gender (M/F) |
|---|---|---|---|
AD | 130 | 75.6 ± 7.6 | 68/62 |
MCI | 260 | 76.0 ± 6.9 | 175/85 |
NC | 130 | 76.7 ± 5.4 | 69/61 |
Table 9 displays the results of the WSRT conducted on CAHHO and the other 7 HHO variants across dimensions 10, 30, 50, and 100. From the table, it is more visually apparent that as the dimensionality increases, CAHHO exhibits a larger lead, indicating its advantage in higher-dimensional problems. For example, at dimension 10, CAHHO outperforms SSFSHHO on 17 functions. As the dimension increases to 30, CAHHO is superior on 26 functions compared to SSFSHHO. At dimension 100, CAHHO achieves a comprehensive victory on all 30 functions. Additionally, we observe that in comparisons with CMHHO, NCHHO, and other variants, CAHHO consistently shows a greater lead with increasing dimensions. This demonstrates that incorporating the crisscross search and AβHC strategies improves the optimization performance of the original HHO. This experiment provides favorable evidence for its potential in addressing tasks in high-dimensional problems.
Table 4. Optimal hyperparameter combinations obtained using the proposed CAHHO algorithm to optimize the ResNet18 model in different diagnosis tasks
Diagnosis task | Number of channels | Learning rate |
|---|---|---|
AD versus NC | 8 | 0.00513229 |
MCI versus NC | 157 | 0.00001 |
AD versus MCI | 44 | 0.0000925649 |
Figure 8 shows the WSRT analysis results for CAHHO and other HHO variants at dimensions of 10, 30, 50, and 100. From the figure, it is evident that CAHHO’s advantage becomes more pronounced as the dimensionality increases. This further highlights CAHHO’s exceptional performance in high-dimensional spaces, emphasizing its stability and robustness.
[See PDF for image]
Fig. 8
WSRT Results of CAHHO and other HHO variants on dimensions 10, 30, 50, and 100
Figures 9, 10, 11, and 12 respectively show the curves of convergence and box plots of CAHHO compared to the other 7 HHO variants across dimensions 10, 30, 50, and 100. These figures visually depict the differences in optimization performance between CAHHO and other variants across various dimensions. CAHHO’s exploitation capability is reflected in the unimodal F1 function at dimensions 10, 30, and 50, as well as the unimodal F3 function at dimension 100. CAHHO consistently obtains better optimization results on unimodal functions, and the box plots further demonstrate CAHHO’s strong stability. In addition to unimodal functions, CAHHO consistently escapes the local optima that other algorithms get stuck in and converges to higher-quality solutions in more complex multimodal, hybrid, and composite functions across all dimensions. This indicates that CAHHO possesses strong exploratory ability and effectively balances exploration and exploitation. The box plots also indicate that CAHHO maintains high stability even in higher dimensions or more complex functions. These findings further illustrate the effectiveness of the crisscross search and AβHC strategies in enhancing HHO.
[See PDF for image]
Fig. 9
Comparison of the curves of convergence and box plots for CAHHO and other HHO variants in 10 dimensions
[See PDF for image]
Fig. 10
Comparison of the curves of convergence and box plots for CAHHO and other HHO variants in 30 dimensions
[See PDF for image]
Fig. 11
Comparison of the curves of convergence and box plots for CAHHO and other HHO variants in 50 dimensions
[See PDF for image]
Fig. 12
Comparison of the curves of convergence and box plots for CAHHO and other HHO variants in 100 dimensions
Comparison experiments with advanced algorithms
To further validate the performance of CAHHO, comparisons were made with 9 other advanced improvement algorithms, including an enhanced WOA (SCLWOA) (Ma et al. 2023), the SCA with communication and quality enhancement (CCEQSCA) (Yu et al. 2023), fuzzy self-tuning PSO (FST_PSO) (SoltaniMoghadam et al. 2019), a chaos-enhanced MFO (CMFO) (Li et al. 2019), an A-C parametric WOA (ACWOA) (Elhosseini et al. 2019), an efficient boosted GWO (OBLGWO) (Heidari et al. 2019c), the SSA with chaos-induced and mutation-driven schemes (CMSSA) (Zhang et al. 2019), an improved opposition-based SCA (OBSCA) (Abd Elaziz et al. 2017), and SCA with DE (SCADE) (Nenavath and Jatoth 2018). Table 10 presents the Avg and Std results of CAHHO and other advanced algorithms. The specific results in Table 10 show that CAHHO achieves first place on 15 functions, second on 10 functions, and consistently maintains smaller Std values on all functions. This indicates that CAHHO maintains a significant advantage even when compared with other excellent improvement algorithms, demonstrating its outstanding performance.
Table 5. Performance comparison of the proposed model with existing methods for AD versus NC
References | Methods | Accuracy |
|---|---|---|
Aderghal et al. (2017) | CNN | 0.8594 |
Aderghal et al. (2016) | CNN | 0.828 |
So et al. (2019) | Multi-layer perceptron (MLP) | 0.85 |
Liu et al. (2020) | CNN | 0.889 |
Choi et al. (2020) | CNN | 0.923 |
Zhang et al. (2021) | CNN | 0.92 |
Tufail et al. (2022) | CNN | 0.8921 |
Jang and Hwang (2022) | Transformer | 0.9321 |
Li et al. (2022) | CNN + Transformer | 0.939 |
Kushol et al. (2022) | Transformer | 0.882 |
Gao et al. (2023) | Transformer | 0.905 |
Xin et al. (2023) | CNN + Transformer | 0.939 |
Wang et al. (2401) | Transformer | 0.924 |
Chen et al. (2025) | CNN + Transformer | 0.9765 |
PSO (Kennedy and Eberhart 1995) | ResNet18-PSO-numCh_LR | 0.9023 |
WOA (Mirjalili and Lewis 2016) | ResNet18-WOA-numCh_LR | 0.8985 |
This study | Proposed ResNet18-CAHHO-numCh_LR | 0.93077 |
Table 11 provides a visual representation of the WSRT and FT results for CAHHO and the other 9 advanced algorithms. The FT shows that CAHHO achieves first place with an ARV of 1.833333, demonstrating a significant lead over the other algorithms. The data in the “R + /R−/R = ” column represents the results of the WSRT, indicating that CAHHO exhibits strong performance and clearly demonstrates its advantages compared to other advanced algorithms.
Figure 13 presents an intuitive comparison of the WSRT results between CAHHO and nine other advanced algorithms, utilizing both bar and line graphs. It is evident from the figure that CAHHO outperforms all other algorithms. Specifically, CAHHO leads OBSCA in 28 functions, ACWOA in 29 functions, and FST_PSO, CMFO, and SCADE in 30 functions. These algorithms are all recent and outstanding improvement algorithms, further demonstrating the excellent performance of CAHHO.
[See PDF for image]
Fig. 13
WSRT result of CAHHO and other advanced algorithms
Figure 14 offers a comprehensive and clear depiction of the convergence curves and box plots for CAHHO in comparison with other advanced algorithms across 10 benchmark functions. For the unimodal functions F1 and F3, CAHHO achieves a significant breakthrough, showcasing its strong exploitation capability. In the multimodal functions F6 and F10, CAHHO maintains a substantial lead. In the more complex hybrid functions F12, F13, and F16, as well as the composite functions F23, F29, and F30, CAHHO is able to find superior solutions within the feasible domain when other methods are trapped in local optima, showcasing its powerful exploratory ability. It is worth noting that in the box plots, CAHHO consistently maintains the smallest box, indicating its strong stability.
[See PDF for image]
Fig. 14
Convergence curves and box plots of CAHHO and other advanced algorithms
Wall-clock time analysis of CAHHO
In this section, we compared the time consumed by CAHHO, HHO, PSO, BA, WOA, SCA, MFO, FA, SSFSHHO, CMHHO, NCHHO, CTHHO, GCHHO, GSHHO, DHHOM, SCLWOA, CCEQSCA, FST_PSO, CMFO, ACWOA, OBLGWO, CMSSA, OBSCA, and SCADE from initialization to termination. The experiments were conducted using 30 search agents with a maximum of 300,000 evaluations. To clearly contrast the wall-clock time consumed by various optimization algorithms on CEC2017 functions, a proportional representation of wall-clock time is depicted in Fig. 15.
[See PDF for image]
Fig. 15
Comparison of wall-clock time consumption for different optimization algorithms
The results illustrated in Fig. 15 indicate that CAHHO exhibits higher overall wall-clock time consumption compared to other algorithms. This suggests that CAHHO possesses a relatively higher time complexity than these comparative algorithms. Nevertheless, comprehensive evaluation of algorithmic performance necessitates consideration of factors beyond time consumption, including convergence rate, solution quality, and applicability to real-world problems. Therefore, despite CAHHO potentially requiring extended computational time, future research could leverage parallel computing techniques to reduce the overall time requirements.
Qualitative analysis of CAHHO
A qualitative examination of the proposed CAHHO is presented in this subsection to highlight its performance in exploration and exploitation. The population size N is set to 20, the dimension dim is set to 30, and the maximum number of iterations (MaxIter) is set to 1000. Figure 16 illustrates the qualitative analysis results of CAHHO on unimodal, multimodal, hybrid, and composition functions. A representative set of functions was selected from the 30 IEEE CEC 2017 benchmark functions, and their 3D characteristics are displayed in the first column (a). The 3D plots clearly show the features of the selected functions. The second column (b) of the figure shows the 2D spatial distribution of the historical search trajectories of individuals in HHO, while the third column (c) shows the same for CAHHO. The distribution of black dots indicates the search trajectories of the individual members, while red dots indicate the ideal solutions. These columns show the position distribution of every member of the total population during the iteration process. The contrast between (b) and (c) shows that, for unimodal, multimodal, hybrid, and composition functions, the positions of individuals in HHO are more scattered compared to CAHHO. In contrast, CAHHO’s individuals are significantly clustered around the global optimal solution, directly demonstrating CAHHO’s superior performance over HHO. The trajectory of CAHHO in the first dimension across iterations is shown in the fourth column (d), which sheds light on the variation of individuals’ positions in this dimension. Initially, the vibration amplitude of the individuals is large, but as the search progresses, the vibration amplitude gradually decreases and stabilizes. Additionally, the plots show that, regardless of the function type, CAHHO quickly locks onto the global optimal solution and converges rapidly. This indicates that CAHHO exhibits strong performance and high robustness when facing different optimization problems. The fifth column (e) represents the convergence curves of CAHHO and HHO, with the red curve representing CAHHO and the blue curve representing HHO. From the unimodal function F1, it is evident that CAHHO has a stronger exploitation capability compared to HHO. Furthermore, in more complex functions, CAHHO shows a significant advantage. This further demonstrates that CAHHO balances exploration and exploitation better than HHO and exhibits superior performance.
[See PDF for image]
Fig. 16
a Three-dimensional visualization of test functions, b two-dimensional spatial mapping of the HHO, c two-dimensional position layout of the CAHHO, d trajectories of the agent of CAHHO in the first dimension, e convergence curve of CAHHO and HHO
Data and proposed model
Data collection
The AD neuroimaging initiative (ADNI) database is a multicenter, multidisciplinary collaborative project aimed at studying and understanding AD and related disorders (Jack et al. 2008a). This database includes cognitively normal elderly individuals, patients with MCI, and AD patients. Since its inception, ADNI has undergone multiple research phases and has become an essential resource for related research worldwide. Its data is publicly shared for use by researchers, and detailed information and data can be accessed on its official website: http://adni.loni.usc.edu/ADNI. All data used in our study have been anonymized to protect patient privacy. Furthermore, we have implemented robust data security measures to prevent unauthorized access and data breaches.
For this study, MRI imaging data from 520 subjects were extracted from the ADNI database, including 130 AD, 260 MCI, and 130 NC subjects, maintaining a 1:2:1 ratio. This ratio was chosen to ensure a balanced representation across different stages of cognitive decline, providing a robust comparison between NC, MCI, and AD. It should be noted that two scan data were selected for each subject. Consequently, a total of 1,040 MRI images were obtained and then split into a training set and a separate test set in a 3:1 ratio. To prevent data leakage, it was ensured that data from the same subject did not appear in both the training and test sets. The participant information is detailed in Table 3.
Data preprocessing
A standard data preprocessing procedure is conducted based on the characteristics of the collected MRI images. The data preprocessing workflow is depicted in Fig. 17, which employs the ADNI-pipeline method proposed in Jack et al. (2008b) for data preprocessing, consisting of three main steps: (1) post-acquisition correction, (2) B1-intensity variation non-uniformity correction, and (3) intensity non-uniformity correction specifically tailored for 3 T MR images. Subsequently, the data are processed using CAT12 (computational anatomy toolbox), implemented within SPM12. In this study, voxel-based morphometry (VBM) analysis was performed using the CAT12 toolbox, encompassing normalization, skull stripping, segmentation, and smoothing (Farokhian et al. 2017). It is important to note that VBM with CAT12 was utilized to partition each MRI into gray matter, white matter, and cerebrospinal fluid. Finally, the experiment focuses on gray matter images rich in information, aiming to reduce computational complexity and processing time. The output dimensions of the gray matter images are 121 × 145 × 121 for subsequent analysis. Data augmentation techniques, including random flipping along the x, y, and z axes with a probability of 0.5, random 90-degree rotation along the x and y axes with a probability of 0.5, and random cropping to a specified size of 80 × 90 × 80, are employed to increase the diversity of the training data. Normalization and intensity adjustment involve normalizing the image intensity values to a standard range and adjusting intensity scaling to enhance contrast. ROI selection and resizing are performed as follows: foreground cropping to focus on brain tissue, central spatial cropping to a size of 112 × 121 × 112, and resizing to a uniform size of 96 × 96 × 96. These steps not only preserve essential structural brain features but also eliminate unnecessary individual differences, ensuring that the model can extract and learn more critical features, thereby improving its accuracy and reliability in AD diagnosis.
[See PDF for image]
Fig. 17
Data preprocessing procedure
The proposed ResNet18-CAHHO model
The number of channels (numCh) and the learning rate (LR) are two critical hyperparameters that significantly influence model performance. The numCh determines the output feature maps in each convolutional layer, and a higher number of channels generally enhances feature extraction capabilities, allowing the model to capture more image details. However, it also increases computational complexity and the number of parameters. Therefore, optimizing numCh can help achieve the optimal balance between feature extraction capability and computational resources, thereby improving the model’s accuracy and efficiency. Similarly, the LR controls the step size during parameter updates in model training. A higher LR can accelerate convergence but tends to cause instability and oscillations, while a lower LR ensures stable training but slows down convergence. Thus, optimizing LR can find the optimal step size, enabling the model to converge quickly and stably to the optimal solution. The innovation of this method lies in combining the CAHHO algorithm with the ResNet18 model to automatically optimize these two hyperparameters, thereby enhancing model performance and improving diagnostic accuracy and stability, as illustrated in Fig. 18.
[See PDF for image]
Fig. 18
Flowchart of the optimized ResNet18-CAHHO model
From the figure, it can be seen that the initialization range for numCh is 8 to 256, and the initialization range for LR is 0.00001 to 0.1. During the training phase, the CAHHO algorithm is used to optimize the ResNet18 model. Specifically, by continuously adjusting numCh and LR during the CAHHO optimization process, the optimal numCh and LR parameters are obtained. Compared to traditional methods, such as PSO, DE or the original HHO, which use fixed or simplistic strategies for hyperparameter adjustment, CAHHO’s ability to combine a crisscross search mechanism with AβHC allows for more efficient and robust hyperparameter tuning. These parameters ensure that the ResNet18 model achieves optimal performance on the training set, specifically maximizing accuracy. Subsequently, the optimized ResNet18 model is applied to the test set for model evaluation, outputting diagnostic results to identify whether patients belong to the AD, NC, or MCI categories. By evaluating performance on the test set, the model's generalization ability and diagnostic accuracy are validated.
AD diagnostic experiments and results
Experimental setup
In this experiment, we utilized the NVIDIA GeForce RTX 4090 GPU and the 13th-Gen Intel (R) Core (TM) i9-13900KF processor to train DL models on the Ubuntu 22.04 operating system. We employed the PyTorch 2.0.1 framework with Python version 3.10.9. The model’s hyperparameters included the use of the Adam optimizer, a batch size of 12, and 100 training epochs, with the adoption of the cross-entropy loss function. When evaluating model performance, we focused on metrics such as accuracy, sensitivity, specificity, AUC, and ROC curve.
This study comprehensively evaluates the performance of the proposed model using several metrics, including accuracy, sensitivity, specificity, area under the curve (AUC), and receiver operating characteristic (ROC) curves. Among these, the AUC metric evaluates the overall performance of the model across different thresholds. It represents the area under the ROC curve, which is a plot of the true positive rate against the false positive rate across various threshold values. A higher AUC score indicates a better ability of the model to discriminate between positive and negative instances. Additionally, the ROC curve provides a graphical representation of the trade-off between sensitivity and specificity at different threshold values. It plots the true positive rate against the false positive rate, allowing us to visually assess the model’s performance across different operating points. An ROC curve that hugs the upper left corner of the plot indicates superior performance, as it signifies high sensitivity and a low false positive rate across a range of thresholds. To compute the performance metrics, the confusion matrix is used, which compares the predicted labels with the true labels. The confusion matrix for binary classification is defined as: .
In the confusion matrix, True Positive (TP) denotes the number of instances correctly predicted as positive, True Negative (TN) denotes the number of instances correctly predicted as negative, False Positive (FP) denotes the number of instances incorrectly predicted as positive, and False Negative (FN) denotes the number of instances incorrectly predicted as negative. The performance metrics are then calculated from the confusion matrix as follows:
23
24
25
Ablation experiments
In this paper, we conducted a comparative analysis of the performance of different models in the diagnostic tasks of AD versus NC, MCI versus NC, and AD versus MCI. We employed ResNet18 as the baseline model and optimized it using the proposed CAHHO algorithm. Specifically, we optimized numCh and LR of ResNet18 and evaluated the optimized models. ResNet18-CAHHO-numCh denotes the ResNet18 model with the number of channels optimized by the proposed CAHHO, ResNet18-CAHHO-LR denotes the ResNet18 model with the learning rate optimized by CAHHO, and ResNet18-CAHHO-numCh_LR denotes the ResNet18 model with both the number of channels and the learning rate optimized by CAHHO.
AD versus NC diagnosis task
As shown in Table 12, a comparative analysis was conducted on the performance of different models in the task of diagnosing AD and NC. The experimental results indicate that ResNet18, as the original DL benchmark model, achieved an accuracy of 0.54615, a sensitivity of 0.3, a specificity of 0.75714, and an AUC value of 0.51143 in the diagnostic task. This suggests that the model performs relatively poorly in distinguishing between AD and NC, particularly in terms of sensitivity. However, the performance of ResNet18-CAHHO-numCh improved significantly, with an accuracy of 0.63077, a sensitivity of 0.51667, a specificity of 0.72857, and an AUC value of 0.63905. Compared to the benchmark model, ResNet18-CAHHO-numCh showed a marked improvement in all metrics, especially in sensitivity. Additionally, the ResNet18-CAHHO-LR model demonstrated excellent sensitivity, reaching 0.88333; however, its specificity decreased to 0.28571, resulting in an accuracy of only 0.56154 and an AUC value of 0.62643. Despite this, the model exhibited a strong ability to detect AD with high sensitivity. More importantly, the ResNet18-CAHHO-numCh_LR model achieved the best performance across all metrics, with an accuracy of 0.93077, a sensitivity of 0.96667, a specificity of 0.9, and an AUC value of 0.93738. Additionally, Fig. 19 presents a radar chart that visualizes the performance of different models across various metrics in the task of diagnosing AD and NC. This radar chart provides an intuitive comparison of the comprehensive performance of each model across different evaluation metrics, further validating the conclusion that the ResNet18-CAHHO-numCh_LR model outperforms the others in all aspects.
[See PDF for image]
Fig. 19
Performance comparison radar chart of different models in AD versus NC diagnostic task
The comparison of different models in the diagnostic task of AD and NC is presented in Fig. 20. Firstly, from the training loss and validation loss graphs, it can be seen that the ResNet18 model has significantly higher losses during both the training and validation phases, especially with noticeable fluctuations in the initial and middle stages, while the models optimized for the number of channels and learning rate using the CAHHO algorithm show lower and more stable losses. Secondly, in the training accuracy and validation accuracy graphs, the ResNet18-CAHHO-numCh_LR model exhibits the best performance, with its training accuracy and validation accuracy rapidly increasing and stabilizing at a high level, ultimately exceeding 0.9 in validation accuracy, which is far superior to the other models. In contrast, the baseline model ResNet18 shows large fluctuations in accuracy throughout the training process without significant improvement, and its validation accuracy remains low, around 0.5. Additionally, while the ResNet18-CAHHO-numCh and ResNet18-CAHHO-LR models show some improvement in accuracy, they still do not match the performance of the ResNet18-CAHHO-numCh_LR model, which optimizes both the number of channels and learning rate simultaneously.
[See PDF for image]
Fig. 20
Training and validation performance comparison of different models in AD versus NC diagnostic task
As shown in Fig. 21, we analyzed the performance of different models in diagnosing AD and NC using confusion matrices. Firstly, the confusion matrix for the baseline model ResNet18 shows that it performs well in recognizing AD, correctly predicting 53 AD cases, but poorly in recognizing NC, correctly predicting only 18 NC cases. Overall, the model has a high misclassification rate for NC, misclassifying 42 NC cases as AD. Secondly, the ResNet18-CAHHO-numCh model shows some improvement, correctly predicting 51 AD cases and 31 NC cases, while misclassifying 29 NC cases as AD, indicating an enhancement in specificity. Furthermore, the ResNet18-CAHHO-LR model displays significant bias in its predictions, correctly predicting only 20 AD cases but performing relatively well in recognizing NC, correctly predicting 53 NC cases, yet misclassifying 50 AD cases as NC, indicating poor sensitivity. Finally, the ResNet18-CAHHO-numCh_LR model performs the best, correctly predicting 63 AD cases and 58 NC cases, with only 2 NC cases and 7 AD cases misclassified, demonstrating excellent performance in both sensitivity and specificity.
[See PDF for image]
Fig. 21
Confusion matrices of different models in AD versus NC diagnostic task
Therefore, the aforementioned experimental results indicate that the combined optimization of channel numbers and learning rate using the proposed CAHHO can significantly enhance the overall performance of the model, particularly in its ability to distinguish between AD and NC.
MCI versus NC diagnosis task
As shown in the experimental results in Table 13, we compared the performance of different models in the diagnostic task of MCI and NC. Firstly, the baseline model ResNet18 achieved an accuracy of 0.65306, a sensitivity of 0, a specificity of 1, and an AUC of 0.58134. Then, the ResNet18-CAHHO-numCh model’s accuracy improved to 0.71939, sensitivity to 0.61765, specificity to 0.77344, and AUC to 0.73300, showing significant performance improvements. Further, the ResNet18-CAHHO-LR model showed some improvement in sensitivity, reaching 0.02941, with a specificity of 0.99219, accuracy of 0.65816, and AUC of 0.41728. However, the ResNet18-CAHHO-numCh_LR model exhibited the best performance across all metrics, with an accuracy of 0.80102, a sensitivity of 0.76471, a specificity of 0.82031, and an AUC of 0.83858. Although its specificity is slightly lower than the baseline model ResNet18, its overall performance is significantly enhanced. Additionally, to provide a more intuitive comparison, Fig. 22 illustrates the performance radar chart of different models in the task of diagnosing MCI and NC, encompassing the performance of each model across various evaluation metrics.
[See PDF for image]
Fig. 22
Performance comparison radar chart of different models in MCI versus NC diagnostic task
As shown in Fig. 23, we compared the performance of different models in the diagnostic task of MCI and NC. Firstly, from the training loss and validation loss graphs, it can be seen that the ResNet18 model has significantly higher losses during both the training and validation phases, especially with noticeable fluctuations in the initial and middle stages, while the models optimized for the number of channels and learning rate using the CAHHO algorithm show lower and more stable losses. Secondly, in the training accuracy and validation accuracy graphs, the ResNet18-CAHHO-numCh_LR model exhibits the best performance, with its training accuracy and validation accuracy rapidly increasing and stabilizing at a high level, ultimately reaching a validation accuracy close to 0.8, far superior to the other models. In contrast, the baseline model ResNet18 shows large fluctuations in accuracy throughout the training process without significant improvement, and its validation accuracy remains low, around 0.6. Additionally, while the ResNet18-CAHHO-numCh and ResNet18-CAHHO-LR models show some improvement in accuracy, they still do not match the performance of the ResNet18-CAHHO-numCh_LR model.
[See PDF for image]
Fig. 23
Training and validation performance comparison of different models in MCI versus NC diagnostic task
As shown in Fig. 24, we analyzed the performance of different models in diagnosing MCI and NC using confusion matrices. The baseline ResNet18 model correctly predicted 128 MCI cases but failed to identify any NC cases, misclassifying 68 NC cases as MCI. In contrast, the ResNet18-CAHHO-numCh model showed improvement by correctly predicting 99 MCI and 42 NC cases, thereby reducing NC misclassifications to 26. Similarly, the ResNet18-CAHHO-LR model correctly predicted 127 MCI and 2 NC cases, although it still misclassified 66 NC cases as MCI. Notably, the best performance was achieved by the ResNet18-CAHHO-numCh_LR model, with only 16 NC and 23 MCI cases misclassified.
[See PDF for image]
Fig. 24
Confusion matrices of different models in MCI versus NC diagnostic task
In summary, the dual optimization of the number of channels and the learning rate using the improved CAHHO algorithm significantly enhances the performance of the ResNet18 model in diagnosing MCI, with the ResNet18-CAHHO-numCh_LR model showing superior performance across various metrics, thereby highlighting the effectiveness and superiority of the dual optimization strategy.
AD versus MCI diagnosis task
In this study, we compared the performance of different models in diagnosing AD and MCI. As shown in Table 14, the results indicate that although the original ResNet18 model exhibited outstanding sensitivity, its low specificity (0.01538) resulted in an overall average AUC (0.5771). The ResNet18-CAHHO-numCh model achieved a slight improvement in specificity (0.03077), but the AUC remained low (0.51538). However, the ResNet18-CAHHO-LR model showed significant improvements in accuracy (0.77436), specificity (0.61538), and AUC (0.81704), demonstrating a more balanced performance. Furthermore, the ResNet18-CAHHO-numCh_LR model achieved the best performance across all metrics, with an accuracy of 0.80513, sensitivity of 0.88462, specificity of 0.64615, and an AUC of 0.84675. This indicates that the dual optimization of channels and learning rate significantly enhances the overall performance of the ResNet18 model in the AD and MCI diagnosis tasks. As shown in Fig. 25, the radar chart clearly demonstrates the comprehensive advantages of the ResNet18-CAHHO-numCh_LR model across all performance metrics.
[See PDF for image]
Fig. 25
Performance comparison radar chart of different models in AD versus MCI diagnostic task
The results are illustrated in Fig. 26, showing the training and validation performance comparison of different models for AD and MCI. Firstly, as shown in the figure, it is evident that the ResNet18 model's training and validation losses are significantly higher than those of the optimized models, indicating substantial overfitting during training. In contrast, the optimized models, particularly the ResNet18-CAHHO-numCh_LR model, maintained lower levels of both training and validation losses, demonstrating good convergence and generalization capabilities. Secondly, in terms of training accuracy and validation accuracy, the ResNet18-CAHHO-numCh_LR model showed the most outstanding performance. Its training accuracy steadily increased, eventually reaching close to 0.8, while its validation accuracy also showed a consistent upward trend, stabilizing around 0.8. In comparison, the original ResNet18 model, despite initially having higher training accuracy, exhibited large fluctuations and lower overall validation accuracy, further confirming its overfitting issue. Additionally, both the ResNet18-CAHHO-numCh and ResNet18-CAHHO-LR models performed significantly better than the original ResNet18 model. The ResNet18-CAHHO-numCh model maintained relatively stable training accuracy, and although its validation accuracy fluctuated slightly, it generally outperformed the original model. The ResNet18-CAHHO-LR model’s training and validation accuracies fluctuated considerably in the early stages but gradually stabilized, ultimately showing good performance across all metrics.
[See PDF for image]
Fig. 26
Training and validation performance comparison of different models in AD versus MCI diagnostic task
As shown in Fig. 27, the confusion matrices illustrate the diagnostic results of various models for AD and MCI. The original ResNet18 model correctly classified only 1 AD sample, with 64 AD samples misclassified as MCI. The ResNet18-CAHHO-numCh model showed slight improvement, correctly classifying 2 AD samples but still misclassifying 63. The ResNet18-CAHHO-LR model showed significant improvement, correctly classifying 40 AD samples and reducing the misclassification of AD samples as MCI to 25. The best performance was observed with the ResNet18-CAHHO-numCh_LR model, which correctly classified 42 AD samples, reduced the misclassification of AD samples as MCI to 23, and correctly classified 115 MCI samples with only 15 misclassified as AD.
[See PDF for image]
Fig. 27
Confusion matrices of different models in AD versus MCI diagnostic task
Figure 28 presents the ROC curves for different models in the diagnostic tasks of AD versus NC, MCI versus NC and AD versus MCI. Firstly, the original ResNet18 model performs poorly in all classification tasks, with ROC curves close to the diagonal, indicating poor classification performance. In contrast, the ResNet18-CAHHO-numCh shows some improvement in the ROC curves, but the enhancement is limited. Furthermore, the ResNet18-CAHHO-LR performs well in the AD versus MCI classification task, with ROC curves significantly above the diagonal, although its performance in other tasks remains insufficient. Notably, for distinguishing between AD and MCI, the original ResNet18 outperforms the ResNet18-CAHHO-numCh, as indicated by its higher ROC curve, demonstrating better classification performance. Finally, the ResNet18-CAHHO-numCh_LR model achieves the best performance across all classification tasks. Its ROC curves are significantly higher than those of the other models, approaching the ideal top-left corner, indicating excellent classification capabilities. In conclusion, the dual optimization of the number of channels and the learning rate using the CAHHO algorithm significantly enhances the performance of the ResNet18 model in diagnosing AD and MCI, with the ResNet18-CAHHO-numCh_LR model showing the most outstanding performance. Therefore, the ResNet18-CAHHO proposed in this paper specifically refers to the ResNet18-CAHHO-numCh_LR model.
[See PDF for image]
Fig. 28
ROC curves of different models in AD versus NC, MCI versus NC and AD versus MCI
Optimal hyperparameter selection results
By optimizing the ResNet18 model using the proposed CAHHO algorithm, the optimal hyperparameter combinations for different diagnostic tasks were obtained, as shown in Table 4. Specifically, for the classification task of AD versus NC, the algorithm achieved an optimal number of channels of 8 and an optimal learning rate of 0.00513229. This relatively low number of channels reflects the simplicity of this classification task, where fewer channels can effectively learn discriminative features, and the moderate learning rate helps the model converge efficiently. In contrast, for the classification task of MCI versus NC, the algorithm provided an optimal number of channels of 157, significantly higher than that for the AD versus NC task, indicating the greater difficulty of this task. The model needs to learn more complex and richer features to effectively distinguish between MCI patients and normal individuals, and the optimal learning rate for this task was 0.00001, a relatively low value likely due to the slower convergence required. Furthermore, for the classification task of AD versus MCI, the algorithm determined the optimal number of channels to be 44, with a learning rate of 0.0000925649, reflecting the challenge of distinguishing between AD and MCI due to their subtle differences in imaging data.
In summary, these experimental results clearly demonstrate that the optimal hyperparameter combinations for the model vary significantly depending on the difficulty of the classification tasks. For simpler tasks, a lower number of channels and a moderate learning rate can yield good results, whereas more challenging tasks require increased model capacity and exploration of a broader parameter space.
Comparison with existing methods
To assess the effectiveness of the proposed ResNet18-CAHHO-numCh_LR in AD recognition, we performed a comparative analysis with various DL models, including CNNs, Transformers, and hybrid approaches, using the ADNI dataset, as shown in Table 5. Since different studies employ distinct data processing protocols, this comparison serves as an indicative reference. Nevertheless, our findings reveal that ResNet18-CAHHO-numCh_LR achieved an accuracy of 0.93077, surpassing most CNN-based methods. Additionally, our model performed competitively against Transformer-based approaches, such as Kushol et al. (2022) (0.882), Gao et al. (Gao et al. 2023) (0.905) and Wang et al. (2401) (0.924). Furthermore, ResNet18-CAHHO-numCh_LR exhibited comparable performance to Li et al. (2022) (0.939) and Xin et al. (2023) (0.939), both of which integrate CNN and Transformer architectures. However, Chen et al. (2025) (0.9765) reported the highest accuracy among the compared methods. Moreover, we conducted a direct comparison with other advanced optimization techniques applied to ResNet18 hyperparameter tuning. Notably, PSO-CAHHO-numCh_LR and WOA-CAHHO-numCh_LR performed worse than CAHHO but still outperformed standard CNN-based approaches. These results highlight the effectiveness and robustness of our optimization strategy in improving AD classification performance and demonstrate its potential advantages in AD diagnosis.
Discussion
Hyperparameter analysis
Figure 29 illustrates the distribution of the number of channels and learning rates optimized using the proposed CAHHO algorithm for the ResNet18 model in the diagnostic tasks of AD versus NC, MCI versus NC, and AD versus MCI during the training process. By observing the distribution of the number of channels and learning rates under different tasks, we can gain a deeper understanding of the impact of these hyperparameters on model performance.
[See PDF for image]
Fig. 29
Distribution of the number of channels and learning rates in AD versus NC, MCI versus NC and AD versus MCI
Firstly, in terms of the distribution of the number of channels, for the AD versus NC task, the number of channels is primarily concentrated in the lower range (below 50). This indicates that fewer channels are sufficient to capture the key features distinguishing AD from NC, thereby reducing the model’s complexity and computational cost. In contrast, for the MCI versus NC task, the distribution of the number of channels is more dispersed, suggesting that more channels are needed to capture the subtle and complex features, reflecting the complexity of distinguishing MCI from NC. For the AD versus MCI task, although the distribution of the number of channels is more dispersed than in the AD versus NC task, it still tends to be in the lower range. This implies that the model needs a certain level of complexity to capture the features distinguishing AD from MCI but does not require an excessively high number of channels.
Secondly, the distribution of learning rates shows that in all tasks, learning rates are primarily concentrated in the lower range of 0 to 0.02, especially in the AD versus NC task. Low learning rates contribute to more stable model training, preventing overfitting and instability. However, in the MCI versus NC and AD versus MCI tasks, while low learning rates remain predominant, there are also data points with higher learning rates. This suggests that for these more complex tasks, the model sometimes requires higher learning rates to accelerate the learning process and quickly adjust model parameters to accommodate the complex feature variations.
Furthermore, the joint distribution of the number of channels and learning rates reveals the parameter selection strategies of the model in different tasks. In the AD versus NC task, data points are mainly concentrated in the low channels and low learning rates region, indicating that this combination effectively captures the key features distinguishing AD from NC while ensuring stable training. In the MCI versus NC task, the combinations of channels and learning rates are more diverse, with a broader distribution of data points. This shows that the model needs to try various parameter combinations to find the optimal feature representation method for effectively distinguishing MCI from NC. In the AD versus MCI task, although data points are relatively concentrated in the low learning rates region, the selection of the number of channels is more diverse. This reflects that, when handling the AD versus MCI task, the model needs a broader exploration of different combinations of channels and learning rates to achieve the best classification performance.
In conclusion, these experimental results demonstrate that optimizing the number of channels and learning rates using the proposed CAHHO algorithm can significantly enhance the performance of the ResNet18 model in various diagnostic tasks. Particularly, the combination of low learning rates and appropriate numbers of channels shows stable and effective training results across different tasks. Meanwhile, different tasks have varying requirements for the number of channels and learning rates, indicating that the parameter settings need to be flexibly adjusted based on the specific task characteristics during model optimization. This improves the model’s generalization ability and classification performance, providing important theoretical and practical guidance for further model optimization and improvement.
Visualization analysis
Figure 30 presents the Gradient-Weighted Class Activation Mapping (Grad-CAM) (Selvaraju et al. 2017) visualization results of the proposed ResNet18-CAHHO-numCh_LR model for diagnostic tasks involving AD versus NC, MCI versus NC, and AD versus MCI. The figure displays the brain MRI images corresponding to the actual and predicted (pred) labels, along with the heatmaps generated by Grad-CAM.
[See PDF for image]
Fig. 30
Key brain regions identified by the proposed model in AD versus NC, MCI versus NC, and AD versus MCI visualized by Grad-CAM
The Grad-CAM visualization results indicate that the improved ResNet18 model focuses primarily on the medial temporal lobe structures such as the hippocampus (Nedelska et al. 2012) and parahippocampal gyrus (Kesslak et al. 1991), which are closely associated with memory and spatial navigation when diagnosing AD and NC. These regions typically exhibit early and significant atrophy in AD patients, making them important imaging markers for the disease. Additionally, the model also highlights the parietal regions (Greene and Killiany 2010; Lu et al. 2025) involved in perceptual processing and attention regulation, such as the inferior parietal lobule, reflecting the pathological changes in the later stages of AD. Moreover, in the classification of MCI versus NC, the proposed ResNet18 model, besides focusing on the aforementioned medial temporal and parietal regions, also pays special attention to the frontal lobe areas, particularly the dorsolateral prefrontal cortex (Barbey et al. 2013) associated with executive function and working memory. The abnormalities in these areas might be the underlying imaging basis for cognitive impairment in MCI patients. Furthermore, the model’s attention to these regions is more dispersed compared to AD cases, reflecting the heterogeneity and transitional characteristics of MCI. Finally, when distinguishing between AD and MCI, the model extends its attention to other areas such as the temporoparietal junction (Meyer et al. 2019), which is involved in multisensory information integration. This suggests the heterogeneity in the pathological basis between AD and MCI, indicating that AD might be associated with more extensive cortical damage, while MCI exhibits more localized and transitional changes.
In summary, this Grad-CAM visualization study demonstrates that the improved ResNet18 model can automatically learn key imaging markers for AD and MCI, focusing primarily on brain regions closely related to cognitive functions such as memory, spatial navigation, and executive function. By generating visual explanations of key brain regions through Grad-CAM technology, the model provides intuitive supplementary information for clinicians, thereby enhancing the credibility of the diagnosis. This ability to identify critical brain regions enables the model to effectively diagnose AD and MCI patients. However, some misdiagnoses of NC and MCI samples highlight the model’s limitations in certain situations, providing important insights for future model improvement and optimization. These results underscore the importance of further optimizing the model to better distinguish between normal aging and pathological cognitive impairment, as well as the need for more research on MCI heterogeneity and early lesion identification.
Implication of the study
The proposed ResNet18-CAHHO model not only enhances the efficiency of deep learning-based AD diagnosis but also improves interpretability through visualization techniques. In the future, we plan to collaborate further with clinical experts to conduct more in-depth validations of the Grad-CAM visualization results, thereby consolidating the practical diagnostic value of our approach. It is worth noting that the proposed method has the ability to automatically optimize hyperparameters, which is expected to adapt to different hospital and equipment environments in the future. By embedding strict data privacy protection measures, it will meet medical regulatory requirements, providing strong support for early AD intervention. By automatically identifying key imaging biomarkers, this model aids clinicians in the early-stage diagnosis, which is critical for timely intervention and effective disease management.
Limitation of the study
In terms of limitations, the constraints of this study can be categorized into theoretical and practical aspects. On the theoretical side, the model may face challenges in handling complex patterns such as data noise, nonlinear relationships among features, and the integration of multimodal data, especially in cases of class imbalance, due to its reliance on a deep learning framework. Moreover, the proposed framework is currently optimized for ResNet18, and further validation is required to assess its adaptability to other deep learning architectures, such as CNN variants and transformer-based models. Although this study has conducted comparative experiments with classical metaheuristic algorithms, HHO variants, and other advanced optimization methods, additional comparisons with recently proposed optimization algorithms, such as the Fitness Dependent Optimizer (FDO) (Abdullah and Ahmed 2019), Child Drawing Development Optimization (CDDO) (Abdulhameed and Rashid 2022), Donkey and Smuggler Optimization (DSO) (Shamsaldin et al. 2019), Ant Nesting Algorithm (ANA) (Hama Rashid et al. 2021), FOX Algorithm (FOX) (Mohammed and Rashid 2023), Learner Performance-based Behavior (LPB) (Rahman and Rashid 2021), Goose Algorithm (Goose) (Hamad and Rashid 2024), Lagrange Elementary Optimization (LEO) (Aladdin and Rashid 2304), and Shrike Optimization Algorithm (SHOA) (AbdulKarim and Rashid 2024), would further strengthen the evaluation of CAHHO’s effectiveness. Due to computational constraints and the extensive time required for implementing and fine-tuning these algorithms, they have not been included in the current study. However, in future work, we plan to integrate a selection of these algorithms for further comparative analysis to provide a more comprehensive benchmarking of CAHHO’s performance in deep learning hyperparameter optimization tasks. From a practical perspective, although CAHHO enhances hyperparameter optimization, its computational cost is relatively high compared to simpler optimization techniques, necessitating further improvements through parallelization and hardware acceleration. Additionally, the model’s performance may vary across different clinical settings, warranting additional empirical validation to enhance its applicability and accuracy.
Conclusions and future works
This study successfully proposes a DL framework based on the improved CAHHO and the ResNet18 model, named ResNet18-CAHHO, for the early diagnosis of AD. The CAHHO algorithm effectively improves performance in both the exploration and exploitation phases through the crisscross search and the AβHC mechanisms. Moreover, comparisons with the traditional HHO and various other meta-heuristic algorithms demonstrated that CAHHO has superior global search capability and faster convergence speed when solving optimization problems. Through comparative experiments, we demonstrated that the CAHHO algorithm has significant advantages in optimizing hyperparameters of the DL model and identified the optimal combination of model channel numbers and learning rates for different diagnostic tasks. Consequently, experimental results show that the optimized ResNet18-CAHHO model exhibits excellent performance across various AD diagnostic tasks. Furthermore, Grad-CAM visualization analysis confirmed that the model can automatically learn key brain imaging markers closely related to AD and MCI, thereby providing strong support for the diagnosis of AD.
While the ResNet18-CAHHO model has shown impressive results, the quest for diagnostic excellence continues. Therefore, future work will focus on enhancing feature representation to detect even subtler early-stage cognitive decline indicators. Additionally, the model’s generalization will be rigorously tested across diverse datasets to ensure its robustness and applicability. Specifically, we plan to extend the evaluation to additional datasets, including the open access series of imaging studies (OASIS), to further investigate the model’s generalization capability and ensure its applicability in real-world clinical settings. Furthermore, the integration of multimodal data, including genetic, clinical information, PET scans, and clinical scores, will be explored to enrich the diagnostic insights and further enhance the model's accuracy in diagnosing AD. We also consider incorporating self-supervised learning and attention mechanisms to enhance the model’s focus on key brain regions, thereby reducing misclassification. In addition, we plan to explore hybrid approaches that combine Grad-CAM with other explainability methods, such as shapley additive explanations (SHAP) or local interpretable model-agnostic explanations (LIME), to assess whether they can provide additional insights without introducing excessive computational complexity. Moreover, we will investigate model-agnostic interpretability frameworks specifically tailored for medical imaging to enhance clinical applicability. Expanding the model to other neurodegenerative diseases is also a promising direction for future research. The CAHHO framework could be adapted to diagnose and monitor diseases such as Parkinson’s disease, Huntington’s disease, and frontotemporal dementia, which present similar challenges in early diagnosis and progression tracking. Given that early diagnosis of AD can significantly improve patients’ quality of life, it is recommended that policymakers consider integrating such efficient computational models into healthcare systems to support clinical decision-making and public health policies.
Acknowledgements
This work was supported by the Key Projedt of the Natural Science Foundations of Zhejiang Province (CN) under Grant No. LZ24F010007, and the National Natural Science Foundations of China (CN) under Grant No. 62271177.
Author contributions
Jinhua Sheng designed the project and supervised the overall research; Qian Zhang collected data, performed experiments and analysis; Qiao Zhang co-designed the research; Ze Yang, Yu Xin, Binbing Wang, and Rong Zhang participated in data collection or analysis. Qian Zhang and Jinhua Sheng wrote the manuscript. All authors approved the final manuscript.
Data availability
The data supporting this research is derived from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and can be accessed from the ADNI database (adni.loni.usc.edu) upon registration and adherence to the data usage agreement. Additional information can be provided by corresponding authors upon reasonable request.
Declarations
Conflict of interest
The authors declare no conflict of interest.
Ethical statement
This study was approved by the institutional review board (IRB) at Hangzhou Dianzi University (IRB-2020001) and the ethics committee at Beijing Hospital (2022BJYYEC-375–01).
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
Abd Elaziz, M; Oliva, D; Xiong, S. An improved opposition-based sine cosine algorithm for global optimization. Expert Syst Appl; 2017; 90, pp. 484-500.
Abdulhameed, S; Rashid, TA. Child drawing development optimization algorithm based on child’s cognitive development. Arab J Sci Eng; 2022; 47,
AbdulKarim, HK; Rashid, TA. In Search of excellence: SHOA as a competitive shrike optimization algorithm for multimodal problems. IEEE Access; 2024; 12, pp. 98407-98425.
Abdullah, JM; Ahmed, T. Fitness dependent optimizer: inspired by the bee swarming reproductive process. IEEE Access; 2019; 7, pp. 43473-43486.
Aderghal K, Boissenin M, Benois-Pineau J, Catheline G, Afdel K (2016) Classification of sMRI for AD diagnosis with convolutional neuronal networks: a pilot 2-D+ study on ADNI. Multimedia modeling, lecture notes in computer science, p 690–701
Aderghal K, Benois-Pineau J, Afdel K, Gwenaëlle C (2017) FuseMe: classification of sMRI images by fusion of deep CNNs in 2D+ ε projections. In: Proceedings of the 15th international workshop on content-based multimedia indexing
Aladdin AM, Rashid TA (2023) Leo: lagrange elementary optimization. arXiv preprint arXiv:2304.05346
Al-Betar, MA. β-hill climbing: an exploratory local search. Neural Comput Appl; 2017; 28,
Al-Betar, MA; Aljarah, I; Awadallah, MA; Faris, H; Mirjalili, S. Adaptive β-hill climbing for optimization. Soft Comput; 2019; 23,
Alnowaiser, K; Saber, A; Hassan, E; Awad, WA. An optimized model based on adaptive convolutional neural network and grey wolf algorithm for breast cancer diagnosis. PLoS ONE; 2024; 19,
Barbey, AK; Koenigs, M; Grafman, J. Dorsolateral prefrontal contributions to human working memory. Cortex; 2013; 49,
Basheera, S; Sai Ram, MS. Convolution neural network–based Alzheimer's disease classification using hybrid enhanced independent component analysis based segmented gray matter of T2 weighted magnetic resonance imaging with clinical valuation. Alzheimer's Dementia Transl Res Clin Interv; 2019; 5, pp. 974-986.
Bayram, B; Kunduracioglu, I; Ince, S; Pacal, I. A systematic review of deep learning in MRI-based cerebral vascular occlusion-based brain diseases. Neuroscience; 2025; 568, pp. 76-94.
Bochinski E, Senst T, Sikora T (2017) Hyper-parameter optimization for convolutional neural network committees based on evolutionary algorithms. In: 2017 IEEE international conference on image processing (ICIP). IEEE
Chen, J; Wang, Y; Zeb, A; Suzauddola, MD; Wen, Y. Multimodal mixing convolutional neural network and transformer for Alzheimer’s disease recognition. Expert Syst Appl; 2025; 259, 125321.
Choi, B-K; Madusanka, N; Choi, H-K; So, J-H; Kim, C-H; Park, H-G et al. Convolutional neural network-based MR image analysis for Alzheimer’s disease classification. Curr Med Imaging; 2020; 16,
Darwish, A; Ezzat, D; Hassanien, AE. An optimized model based on convolutional neural networks and orthogonal learning particle swarm optimization algorithm for plant diseases diagnosis. Swarm Evol Comput; 2020; 52, 100616.
Dehkordi AA, Sadiq AS, Mirjalili S, Ghafoor KZ (2021) Nonlinear-based Chaotic Harris Hawks optimizer: algorithm and internet of vehicles application. Appl Soft Comput 109
Derrac, J; García, S; Molina, D; Herrera, F. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput; 2011; 1,
Elhosseini, MA; Haikal, AY; Badawy, M; Khashan, N. Biped robot stability based on an A-C parametric whale optimization algorithm. J Comput Sci; 2019; 31, pp. 17-32.3901836
El-kenawy, E-SM; Khodadadi, N; Mirjalili, S; Abdelhamid, AA; Eid, MM; Ibrahim, A. Greylag Goose Optimization: nature-inspired optimization algorithm. Expert Syst Appl; 2024; 238, 122147.
El-Kenawy, E-SM; Eid, MM; Abualigah, L. Machine learning in public health forecasting and monitoring the Zika virus. J Artific Intell Metaheuristics; 2024; 1,
Farokhian, F; Beheshti, I; Sone, D; Matsuda, H. Comparing CAT12 and VBM8 for detecting brain morphological abnormalities in temporal lobe epilepsy. Front Neurol; 2017; 8, 428.
Fetanat, M; Stevens, M; Jain, P; Hayward, C; Meijering, E; Lovell, NH. Fully Elman neural network: a novel deep recurrent neural network optimized by an improved Harris Hawks algorithm for classification of pulmonary arterial wedge pressure. IEEE Trans Biomed Eng; 2021; 69,
Gao, X; Cai, H; Liu, M. A hybrid multi-scale attention convolution and aging transformer network for Alzheimer's disease diagnosis. IEEE J Biomed Health Inform; 2023; 27,
García-Gutiérrez, F; Alegret, M; Marquié, M; Muñoz, N; Ortega, G; Cano, A et al. Unveiling the sound of the cognitive status: machine learning-based speech analysis in the Alzheimer’s disease spectrum. Alzheimer's Res Therapy; 2024; 16,
Greene, SJ; Killiany, RJ. Subregions of the inferior parietal lobule are affected in the progression to Alzheimer's disease. Neurobiol Aging; 2010; 31,
Hama Rashid, DN; Rashid, TA; Mirjalili, S. ANA: ant nesting algorithm for optimizing real-world problems. Mathematics; 2021; 9,
Hamad, RK; Rashid, TA. GOOSE algorithm: a powerful optimization tool for real-world engineering challenges and beyond. Evol Syst; 2024; 15,
Hassan, E; Saber, A; Elbedwehy, S. Knowledge distillation model for Acute Lymphoblastic Leukemia detection: exploring the impact of nesterov-accelerated adaptive moment estimation optimizer. Biomed Signal Process Control; 2024; 94, 106246.
Hassan, E; El-Rashidy, N; Talaa, FM. Review: mask R-CNN models. Nile J Commun Comput Sci; 2022; 3,
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition
Heidari, AA; Mirjalili, S; Faris, H; Aljarah, I; Mafarja, M; Chen, H. Harris hawks optimization: algorithm and applications. Future Gener Comput Syst; 2019; 97, pp. 849-872.
Heidari, AA; Mirjalili, S; Faris, H; Aljarah, I; Mafarja, M; Chen, HL. Harris hawks optimization: algorithm and applications. Future Gener Comput Syst Int J Esci; 2019; 97, pp. 849-872.
Heidari AA, Abbaspour RA, Chen HL (2019) Efficient boosted Grey Wolf Optimizer s for global search and kernel extreme learning machine training. Appl Soft Comput 81
Hsu, JL; Wei, YC; Toh, CH; Hsiao, IT; Lin, KJ; Yen, TC et al. Magnetic resonance images implicate that glymphatic alterations mediate cognitive dysfunction in Alzheimer disease. Ann Neurol; 2023; 93,
Hu, J; Gui, W; Heidari, AA; Cai, Z; Liang, G; Chen, H et al. Dispersed foraging slime mould algorithm: continuous and binary variants for global optimization and wrapper-based feature selection. Knowl-Based Syst; 2022; 237, 107761.
Jack, CR, Jr; Bernstein, MA; Fox, NC; Thompson, P; Alexander, G; Harvey, D et al. The Alzheimer's disease neuroimaging initiative (ADNI): MRI methods. J Magn Reson Imaging off J Int Soc Magn Reson Med; 2008; 27,
Jack, CR; Bernstein, MA; Fox, NC; Thompson, P; Alexander, G; Harvey, D et al. The Alzheimer's disease neuroimaging initiative (ADNI): MRI methods. J Magn Reson Imaging; 2008; 27,
Jang J, Hwang D (2022) M3T: three-dimensional medical image classifier using multi-plane and multi-slice transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Jia, H; Lang, C; Oliva, D; Song, W; Peng, X. Dynamic Harris Hawks optimization with mutation mechanism for satellite image segmentation. Remote Sens; 2019; 11,
Jovanovic, L; Damaševičius, R; Matic, R; Kabiljo, M; Simic, V; Kunjadic, G et al. Detecting Parkinson’s disease from shoe-mounted accelerometer sensors using convolutional neural networks optimized with modified metaheuristics. PeerJ Comput Sci; 2024; 10, e2031.
Karaman, A; Karaboga, D; Pacal, I; Akay, B; Basturk, A; Nalbantoglu, U et al. Hyper-parameter optimization of deep learning architectures using artificial bee colony (ABC) algorithm for high performance real-time automatic colorectal cancer (CRC) polyp detection. Appl Intell; 2023; 53,
Karaman, A; Pacal, I; Basturk, A; Akay, B; Nalbantoglu, U; Coskun, S et al. Robust real-time polyp detection system design based on YOLO algorithms by optimizing activation functions and hyper-parameters with artificial bee colony (ABC). Expert Syst Appl; 2023; 221, 119741.
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN'95-international conference on neural networks. IEEE
Kesslak, JP; Nalcioglu, O; Cotman, CW. Quantification of magnetic resonance scans for hippocampal and parahippocampal atrophy in Alzheimer's disease. Neurology; 1991; 41,
Kushol R, Masoumzadeh A, Huo D, Kalra S, Yang YH. Addformer: Alzheimer’s disease detection from structural Mri using fusion transformer. In: 2022 IEEE 19th international symposium on biomedical imaging (ISBI)
LaTorre A, Pena JM (2017) A comparison of three large-scale global optimizers on the CEC 2017 single objective real parameter numerical optimization benchmark. In: 2017 IEEE congress on evolutionary computation, CEC 2017 - proceedings
Li, HW; Liu, JY; Chen, L; Bai, JB; Sun, YY; Lu, K. Chaos-enhanced moth-flame optimization algorithm for global optimization. J Syst Eng Electron; 2019; 30,
Li C, Cui Y, Luo N, Liu Y, Bourgeat P, Fripp J et al (2022) Trans-ResNet: integrating transformers and CNNs for Alzheimer’s disease classification. In: 2022 IEEE 19th international symposium on biomedical imaging (ISBI).
Liu, M; Li, F; Yan, H; Wang, K; Ma, Y; Shen, L et al. A multi-model deep convolutional neural network for automatic hippocampus segmentation and classification in Alzheimer’s disease. Neuroimage; 2020; 208, 116459.
Logan R, Williams BG, Ferreira da Silva M, Indani A, Schcolnicov N, Ganguly A et al (2021) Deep convolutional neural networks with ensemble learning and generative adversarial networks for Alzheimer’s disease image data classification. Front Aging Neurosci 13
Lu F, Ma Q, Shi C, Yue W (2025) Changes in the parietal lobe subregion volume at various stages of Alzheimer’s disease and the role in cognitively normal and mild cognitive impairment conversion. JIN 24(1)
Ma, H; Xiao, L; Hu, Z; Heidari, AA; Hadjouni, M; Elmannai, H et al. Comprehensive learning strategy enhanced chaotic whale optimization for high-dimensional feature selection. J Bionic Eng; 2023; 20,
McKhann, G; Drachman, D; Folstein, M; Katzman, R; Price, D; Stadlan, EM. Clinical diagnosis of Alzheimer's disease: report of the NINCDS-ADRDA Work Group under the auspices of department of health and human services task force on Alzheimer's disease. Neurology; 1984; 34,
Meng, A-B; Chen, Y-C; Yin, H; Chen, S-Z. Crisscross optimization algorithm and its application. Knowl-Based Syst; 2014; 67, pp. 218-229.
Meyer, F; Wehenkel, M; Phillips, C; Geurts, P; Hustinx, R; Bernard, C et al. Characterization of a temporoparietal junction subtype of Alzheimer's disease. Hum Brain Mapp; 2019; 40,
Mirjalili, S. Moth-flame optimization algorithm: a novel nature-inspired heuristic paradigm. Knowl-Based Syst; 2015; 89, pp. 228-249.
Mirjalili, S. SCA: a Sine Cosine Algorithm for solving optimization problems. Knowl-Based Syst; 2016; 96, pp. 120-133.
Mirjalili, S; Lewis, A. The Whale Optimization Algorithm. Adv Eng Softw; 2016; 95, pp. 51-67.
Mirjalili, S; Mirjalili, SM; Lewis, A. Grey Wolf Optimizer. Adv Eng Softw; 2014; 69, pp. 46-61.
Mirjalili, S; Mirjalili, SM; Yang, X-S. Binary bat algorithm. Neural Comput Appl; 2014; 25,
Mirjalili, S; Gandomi, AH; Mirjalili, SZ; Saremi, S; Faris, H; Mirjalili, SM. Salp Swarm Algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw; 2017; 114, pp. 163-191.
Mohammed, H; Rashid, T. FOX: a FOX-inspired optimization algorithm. Appl Intell; 2023; 53,
Nanni, L; Brahnam, S; Salvatore, C; Castiglioni, I; Initiative, ADN. Texture descriptors and voxels for the early diagnosis of Alzheimer’s disease. Artific Intell Med; 2019; 97, pp. 19-26.
Nazir, S; Patel, S; Patel, D. Assessing hyper parameter optimization and speedup for convolutional neural networks. Int J Artific Intell Mach Learn; 2020; 10,
Nedelska, Z; Andel, R; Laczó, J; Vlcek, K; Horinek, D; Lisy, J et al. Spatial navigation impairment is proportional to right hippocampal volume. Proc Natl Acad Sci; 2012; 109,
Nenavath, H; Jatoth, RK. Hybridizing sine cosine algorithm with differential evolution for global optimization and object tracking. Appl Soft Comput; 2018; 62, pp. 1019-1043.
Pacal, I. A novel Swin transformer approach utilizing residual multi-layer perceptron for diagnosing brain tumors in MRI images. Int J Mach Learn Cybern; 2024; 15,
Pacal, I; Celik, O; Bayram, B; Cunha, A. Enhancing EfficientNetv2 with global and efficient channel attention mechanisms for accurate MRI-based brain tumor classification. Clust Comput; 2024; 27,
Pacal, I; Ozdemir, B; Zeynalov, J; Gasimov, H; Pacal, N. A novel CNN-ViT-based deep learning model for early skin cancer diagnosis. Biomed Signal Process Control; 2025; 104, 107627.
Rahman, CM; Rashid, TA. A new evolutionary algorithm: learner performance based behavior algorithm. Egypt Inf J; 2021; 22,
Ricci, G. Social aspects of dementia prevention from a worldwide to national perspective: a review on the international situation and the example of Italy. Behav Neurol; 2019; 2019, 8720904.
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision
Shamsaldin, AS; Rashid, TA; Al-Rashid Agha, RA; Al-Salihi, NK; Mohammadi, M. Donkey and smuggler optimization algorithm: a collaborative working approach to path finding. J Comput des Eng; 2019; 6,
Shan, W; He, X; Liu, H; Heidari, AA; Wang, M; Cai, Z et al. Cauchy mutation boosted Harris hawk algorithm: optimal performance design and engineering applications. J Comput des Eng; 2023; 10,
Sheng J, Zhang Q, Zhang Q, Wang L, Yang Z, Xin Y, et al (2024) A hybrid multimodal machine learning model for detecting Alzheimer's disease. Comput Biol Med, p 108035
So, J-H; Madusanka, N; Choi, H-K; Choi, B-K; Park, H-G. Deep learning for Alzheimer’s disease classification using texture features. Curr Med Imaging; 2019; 15,
SoltaniMoghadam, S; Tatar, M; Komeazi, A. An improved 1-D crustal velocity model for the Central Alborz (Iran) using particle swarm optimization algorithm. Phys Earth Planet Inter; 2019; 292, pp. 87-99.
Song, S; Wang, P; Heidari, AA; Wang, M; Zhao, X; Chen, H et al. Dimension decided Harris hawks optimization with Gaussian mutation: balance analysis and diversity patterns. Knowl-Based Syst; 2021; 215, 106425.
Storn, R; Price, K. Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim; 1997; 11, pp. 341-359.1479553
Syaifullah, AH; Shiino, A; Kitahara, H; Ito, R; Ishida, M; Tanigaki, K. Machine learning for diagnosis of AD and prediction of MCI progression from brain MRI using brain anatomical analysis using diffeomorphic deformation. Front Neurol; 2021; 11, 576029.
Tufail, AB; Anwar, N; Othman, MTB; Ullah, I; Khan, RA; Ma, Y-K et al. Early-stage Alzheimer’s disease categorization using PET neuroimaging modality and convolutional neural networks in the 2D and 3D domains. Sensors; 2022; 22,
Vaithinathan, K; Parthiban, L. A novel texture extraction technique with T1 weighted MRI for the classification of Alzheimer’s disease. J Neurosci Methods; 2019; 318, pp. 84-99.
Wang, J. A deep learning approach for atrial fibrillation signals classification based on convolutional and modified Elman neural network. Futur Gener Comput Syst; 2020; 102, pp. 670-679.
Wang, M; Wang, JS; Li, XD; Zhang, M; Hao, WK. Harris Hawk optimization algorithm based on Cauchy distribution inverse cumulative function and tangent flight operator. Appl Intell; 2022; 52,
Wang, M; Gong, Q; Chen, H; Gao, G. Optimizing deep transfer networks with fruit fly optimization for accurate diagnosis of diabetic retinopathy. Appl Soft Comput; 2023; 147, 110782.
Wang Y, Chen K, Wang H (2024) Adapt: Alzheimer diagnosis through adaptive profiling transformers. arXiv preprint arXiv:2401.06349
Wen, J; Thibeau-Sutre, E; Diaz-Melo, M; Samper-González, J; Routier, A; Bottani, S et al. Convolutional neural networks for classification of Alzheimer's disease: overview and reproducible evaluation. Med Image Anal; 2020; 63, 101694.
Wolpert, DH; Macready, WG. No free lunch theorems for optimization. IEEE Trans Evol Comput; 1997; 1,
Xin, J; Wang, A; Guo, R; Liu, W; Tang, X. CNN and swin-transformer based efficient model for Alzheimer’s disease diagnosis with sMRI. Biomed Signal Process Control; 2023; 86, 105189.
Xu, X; Lin, L; Sun, S; Wu, S. A review of the application of three-dimensional convolutional neural networks for the diagnosis of Alzheimer’s disease using neuroimaging. Rev Neurosci; 2023; 34,
Yang, X-S. Nature-inspired algorithms and applied optimization; 2017; Berlin, Springer:
Yassen, MA; Abdel-Fattah, MG; Ismail, I; El-Kenawy, E-SM; Moustafa, HE-D. An AI-based system for predicting renewable energy power output using advanced optimization algorithms. J Artif Intell Metaheuristics; 2024; 8,
Yu, H; Zhao, Z; Zhou, J; Heidari, AA; Chen, H. Sine cosine algorithm with communication and quality enhancement: performance design for engineering problems. J Comput des Eng; 2023; 10,
Zhang, Q; Chen, H; Heidari, AA; Zhao, X; Xu, Y; Wang, P et al. Chaos-induced and mutation-driven schemes boosting salp chains-inspired optimizers. IEEE Access; 2019; 7, pp. 31243-31261.
Zhang, Z; Gao, L; Jin, G; Guo, L; Yao, Y; Dong, L et al. THAN: task-driven hierarchical attention network for the diagnosis of mild cognitive impairment and Alzheimer's disease. Quant Imaging Med Surg; 2021; 11,
Zhang, Q; Huang, A; Shao, L; Wu, P; Heidari, AA; Cai, Z et al. A machine learning framework for identifying influenza pneumonia from bacterial pneumonia for medical decision making. J Comput Sci; 2022; 65, 101871.
Zhang, Q; Sheng, J; Zhang, Q; Wang, L; Yang, Z; Xin, Y. Enhanced Harris hawks optimization-based fuzzy k-nearest neighbor algorithm for diagnosis of Alzheimer's disease. Comput Biol Med; 2023; 165, 107392.
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.