INTRODUCTION
Autism spectrum disorder (ASD) is a prevalent neurodevelopmental condition, that is typically characterised by challenges in social interaction, communication, and restricted repetitive interests [1]. Over the past half-century, ASD has evolved from being a narrow, rare childhood disorder to a lifelong condition that has been widely publicised and studied. Its prevalence rate increases year-by-year, which imposes a heavy burden on individuals, families, and society [2]. At present, the diagnosis of ASD primarily relies on doctors' subjective experience, dialogue, and measurement of relevant scales. The high level of expertise required by doctors and the potential influence of subjective factors in their diagnostic process can result in under-diagnosis or misdiagnosis of ASD. To address this challenge, the creation of a fully automated computer-assisted system for diagnosing ASD is helpful to promote early intervention and effective treatment of disorders [3].
With the progress of medical imaging technology, Magnetic Resonance Imaging (MRI) [4] has emerged as a vital tool for studying brain mechanisms and pathology due to its low-cost and non-invasive properties [5, 6]. Therefore, MRI has been extensively employed in the diagnosis of neuropathic disorders. MRI contains several types, such as the commonly used structural MRI (sMRI) and functional MRI (fMRI). In contrast to sMRI, fMRI [7] can capture haemodynamic changes resulting from neuronal activity across the entire brain during multiple time points. As a result, fMRI has found widespread applications in studying brain dysfunction disorders, such as schizophrenia [8], ASD [9], Attention deficit/Hyperactivity disorder [10], and Alzheimer's disease [11]. Among the various types of fMRI, in this study, we specifically focus on the resting-state fMRI (rs-fMRI) data, which does not require the passive performance of tasks or stimuli. As a result, it is comfortable for the subjects, quiet, easy to manipulate, and relatively easier to collect than task-state fMRI (ts-fMRI).
With the advent of high-performance computing devices and the availability of vast amounts of big data, deep learning offers robust technical support for large-scale data analysis, pattern recognition, and classification tasks. Currently, deep learning methods have gained widespread application in ASD diagnostic tasks and have shown remarkable achievements. For example, Wang et al. in Ref. [12] utilised fMRI data from individual subjects and calculated three functional connectivity (FC) matrices by using three distinct brain atlases. Then, they employed a stacked denoising autoencoder to learn high-level feature representations, followed by a multi-layer perceptron (MLP) for ASD diagnosis. This algorithm achieved an accuracy of 74.52% among 949 subjects. In Ref. [13], the brain was firstly partitioned into 116 regions of interest (ROIs) through the automated anatomical labeling (AAL) atlas. Furthermore, a multi-feature selection algorithm was proposed by incorporating dynamic FC, elastic net, and manifold regularisation. Then, the ASD diagnosis was performed by exploiting a multi-kernel support vector machine (SVM). To enhance the performance of ASD diagnosis, in Ref. [14], the multi-head attention encoder and temporal consolidation module were employed to extract spatio-temporal features, where the time series captured in the fMRI data was used as an input. The authors in Ref. [15] proposed an ASD diagnostic algorithm based on an enhanced convolutional neural network (ECNN), which significantly improved the performance of the ASD classifier. As an enhanced deep learning model, ECNN was composed of two stacked temporal convolutional networks (both utilising causal convolutions and dilated convolutions), which makes it suitable for sequence data with large receptive fields in the temporal domain. In Ref. [16], Kang et al. explored diverse brain functional features from multiple perspectives. By performing feature selection and ensemble learning on dynamic spatio-temporal features, low-level brain FC features, and high-level brain FC features of fMRI data, they effectively alleviated the issues arising from data heterogeneity. This method achieved an accuracy of 72% in ASD diagnosis tasks involving multi-site samples.
Regarding traditional convolutional approaches, the translation invariance fails to be maintained on non-Euclidean data spaces. For extending convolutions to non-Euclidean structures like graphs, researchers have constructed a graph convolutional network (GCN). GCN combines the features of a node with those of its neighbouring nodes through a graph convolutional layer, which generates new features for that node and enables feature extraction on graph-structured data. In Ref. [17], a scalable method for semi-supervised learning is proposed using GCN based on the graph-structured data. Currently, GCN is widely applied in mental health prediction, particularly in the field of ASD diagnosis, and its potential has been demonstrated in previous works. For instance, Li et al. proposed a universal ensemble model that incorporates hierarchical GCN and transfer learning in Ref. [18]. This model leveraged hierarchical GCN to extract network feature embeddings for various topological structures, while also utilising transfer learning across related domains of neurological disorders to capture inherent correlations. The work in Ref. [18] effectively addressed the limitations imposed by small sample sizes in medical datasets and demonstrated outstanding performance in diagnosing ASD and Alzheimer's disease. Cao et al. proposed a graph structure-aware long short-term memory (GSA-LSTM) framework by utilising dynamic brain network representations in Ref. [19]. GSA-LSTM incorporated GCN into LSTM to extract spatio-temporal embeddings of time-varying graphs, which enables better performance in classification. In Ref. [20], Chu et al. presented an ASD diagnostic algorithm derived from multi-scale graph representation learning. This algorithm initially established multi-scale graphs for each subject by using multiple brain atlases. It then automatically learnt graph representation vectors of different scales through graph convolution operations. The final ASD diagnosis results were obtained using a softmax layer. By conducting diagnostic experiments on 184 subjects, this model achieved an accuracy of 79.5%. In Ref. [21], Zhao et al. proposed an ASD diagnostic algorithm by utilising a self-attention mechanism and GCN. This algorithm utilised the time series and FC of fMRI data to construct both low-order and high-order FC networks. By employing multi-view feature enhancement and integrating self-attention pooling layers, more discriminative features were extracted, which led to an improved ASD diagnosis performance.
The use of GCN in brain disease diagnosis has become a hot topic. However, there are challenges posed by the diverse array of MRI scanning protocols, participant states, recruitment strategies (such as age, intelligence quotient [IQ], disease severity, and treatment history), and other variables. Consequently, data collected from different locations, times, and environments may exhibit substantial variability. Therefore, excepted brain imaging data, researchers have also incorporated phenotypic information (such as age, gender, acquisition site, or IQ of subjects) into the scope of automated ASD diagnosis research, which is commonly known as multimodal data-based ASD diagnosis methods [22, 23]. Although brain disease prediction methods are still evolving, GCN-based ASD diagnosis methods face several challenges when dealing with complex medical image data. Many existing methods are constrained by shallow architectures that focus on a single graph representation, which not only limits the single-scale analysis but also prevents the full utilisation of distinctive features and valuable hidden information among different subjects. Additionally, most algorithms have not adequately considered the utilisation of multimodal data, which hinders the improvement of model performance. To explore a more effective objective diagnosis solution for ASD, we aim to analyse fMRI data using GCN and propose a new GCN framework for ASD diagnosis called variable multi-graph and multimodal data-based deep graph convolutional network (VMM-DGCN) to address the aforementioned challenges. The primary contributions of our study are summarised as follows:
-
We propose a novel end-to-end ASD diagnosis model—VMM-DGCN. VMM-DGCN has the strengths of CNN for local feature extraction and DeepGCN for global feature extraction, which enables a better representation of graph information.
-
A variable multi-graph construction strategy is proposed, which not only effectively captures the multi-scale feature representation of each subject but also integrates non-imaging information into the feature representation at each scale. By this strategy, we construct multiple population graphs to extract the associated information among subjects adequately.
-
Compared to existing methods, VMM-DGCN achieves a better diagnostic performance in ASD diagnosis tasks. Furthermore, in classification tasks where the relationships between data points are unclear, it also serves as a generalisable semi-supervised learning approach.
The paper is structured as follows. Section 2 presents a detailed overview of the proposed VMM-DGCN model. Section 3 describes the experimental setup, including data preprocessing, model training, and evaluation metrics. In Section 4, the performance of VMM-DGCN is demonstrated and analysed according to the results. Finally, Section 5 provides the conclusion of this study.
METHOD
The flowchart in Figure 1 depicts the overall process of the proposed ASD automatic diagnosis algorithm. This method consists of four main steps: low-level feature extraction, variable multi-graph construction, multi-scale DeepGCNs, and classification. In the subsequent sections, the proposed algorithm will be explained in detail.
[IMAGE OMITTED. SEE PDF]
Low-level feature extraction
In this study, we denote the N subjects as S1,S2,⋯,SN. Firstly, the brain is partitioned into different ROIs based on the Harvard–Oxford (HO) atlas [24]. From each ROI, a set of time series is extracted and subsequently normalised. Secondly, the FC matrix is generated by calculating the Pearson correlation coefficient (PCC) between the ROIs, which is given by the following equation:
Variable multi-graph module
To address the issue of data variations due to different locations, time, and environmental factors, we construct a variable multi-graph module, which constructs sparse graphs to effectively represent the characteristics of each subject by integrating imaging data and phenotypic data (a part of non-imaging information), including site and gender. In the sparse graphs, the similarity between pairs of subjects can be discovered, which is beneficial for ASD diagnosis. Then, the ASD diagnosis task is addressed as a node classification problem within a population graph composed of N nodes and a collection of weighted edges. An undirected graph can be denoted as G(V,E,A), where V represents the collection of nodes and N is the total number of nodes in the graph (|V| = N). The collection of edges E represents the edges connecting any two nodes. Additionally, the connection strength between two nodes is described by the adjacency matrix A. In our study, three subgraphs are constructed, and their node features are obtained from the multi-scale representations of imaging data. The adjacency matrices of these subgraphs are generated by taking into account both the similarity of node features and the phenotypic information of the subjects. Therefore, the node representations and adjacency matrices of these three subgraphs are different. The following sections will explain the construction process of node features and edges in the population graph.
To better extract the features relevant to fMRI and ASD diagnosis, one-dimensional CNNs with various kernel sizes are employed to conduct multi-scale feature extraction on the low-level FC feature vectors. Since the obtained feature representations are high-dimensional and may contain many irrelevant features for the ASD diagnosis task, the SVM recursive feature elimination (SVM-RFE) [25] strategy is employed to reduce the dimension of the FC features. Specifically, for each multi-scale feature, the top 50% of features are retained as the final feature representation for each subject. By performing the aforementioned steps, three-scale feature representations, namely Fea1, Fea2, and Fea3, are obtained for each subject. Fea1 is obtained by applying SVM-RFE to the low-level feature. Fea2 is obtained by applying SVM-RFE to the low-level feature passed through a one-dimensional convolution filter with a kernel size of 7. Fea3 is obtained by applying SVM-RFE to the low-level feature passed through a one-dimensional convolution filter with a kernel size of 11. Fea1, Fea2, and Fea3 are used as the node feature representations for subgraph 1, subgraph 2, and subgraph 3, respectively.
In a graph, the relationships between nodes are represented by edges, which play a crucial role in modelling. We establish these connections by evaluating the similarity among subjects. Besides considering the similarity of node features according to imaging features, we also incorporate phenotype information M = {Mh}, such as gender, site, and age, into the calculation of edges. Consequently, the graph's adjacency matrix is constructed to represent the connectivity patterns between nodes, which is given by the following equation:
[IMAGE OMITTED. SEE PDF]
In the specific construction process, we first construct the similarity matrix Sim1 through node embedding features, which is given by the following equation:
Next, we utilise phenotype information (such as site and gender) to construct a node similarity measure Sim2 for obtaining more valuable clinical support. Assuming there are H types of phenotype information, then the Sim2 can be expressed as follows:
For the quantifiable phenotype information such as age and IQ, the function ϕ(⋅) in Equation (5) turns to be a unit step function with a threshold θ, which is given by the following equation:
Finally, by fusing the two matrices through a Hadamard product, a similar matrix is obtained as follows:
To further improve the computational efficiency of the population graph, we enhance the sparsity of the graph by excluding weaker associations (Fi,Fj) in the matrix C′. Specifically, the connection values in C′ exceeding a threshold value of 0.4 are set to 1, while the remaining connection values are set to 0, and then the binarised connectivity matrix C is obtained, accordingly.
As shown in Figure 2, the definition of the weight matrix W in the variable multi-graph module is presented as follows:
After the above steps, three subgraphs with different node feature representations and different adjacency matrices are constructed, which are denoted as Subgraph1, Subgraph2, and Subgraph3, respectively. In addition, to obtain the best diagnostic performance of ASD, the site and gender information of each subject are selected for constructing the edge features of the graph based on the subsequent experiments.
Multi-scale DeepGCN module
Spectral graph convolution
By applying the theorem of convolution, image signals can be multiplied in the spectral domain. The essence of GCN lies in defining graph convolutions in the spectral domain by utilising the Laplace matrix and graph Fourier transform. Let us consider a graph G with N nodes, and the normalised Laplacian matrix of G is defined as follows:
GCN extends convolution operations from grid-based data to graph-structured data using graph Fourier transform. Let denotes the filter function matrix for eigenvalues λ with parameter θ, and denotes feature representing of the nodes in the graph G. The operation of graph convolution can be described as follows:
Considering the computational complexity of decomposition, gθ(Λ) is approximated by truncating the kth order Chebyshev polynomial expansion Tk(x), which can be expressed as . Also, we define , where λmax is the largest eigenvalue, and the Chebyshev coefficient θk(k = 1,…K) forms a learning parameter vector. The Chebyshev polynomial is obtained from the recursive formula Tk(x) = 2xTk−1(x)-Tk-2(x), and T0(x) = 1, T1(x) = x [26]. The adjusted Laplacian graph can be set to , then , and the convolution of graph signal x can be simplified as follows:
DeepGCN model
Different from the traditional convolutional neural networks, graph convolutional neural networks utilise propagation between layers to update node features. With an increasing number of Chebyshev convolution (ChebConv) layers, node features gradually gather further, which expands the network's receptive field. This expansion is helpful in obtaining more accurate and comprehensive node characteristics. Consequently, we construct DeepGCN by stacking multiple ChebConv layers to extract deeper features from the graph. To facilitate multi-channel transmission between the inputs and outputs of each layer in DeepGCN, we define the propagation rule between layers as follows:
The framework of DeepGCN in this paper depicted in Figure 3 comprises an input layer, multiple hidden layers, and an output layer. To mitigate the issue of vanishing gradients, we incorporate ReLU activation functions after both the input layer and the hidden layers. The output features from the output layer are subsequently utilised as the input for the ASD diagnosis classifier.
[IMAGE OMITTED. SEE PDF]
In Figure 3, to overcome the over-fitting and over-smoothing problems in GCN, the DropEdge strategy [27] is incorporated at the beginning of each DeepGCN layer during network training. DropEdge serves as a data augmentation technique by stochastically removing edges from the input graph during each training iteration. Specifically, it modifies the adjacency matrix A by setting the V × p non-zero elements to zero, where V represents the count of edges and p represents the probability. DropEdge is not utilised during network testing. Furthermore, since three different subgraphs (Subgraph1, Subgraph2, and Subgraph3) are constructed by combining features of different scales and their corresponding adjacency matrices, three corresponding DeepGCNs are constructed accordingly. The number of hidden layers neurons in DeepGCN1, DeepGCN2, and DeepGCN3 are 32, 128, and 64, respectively.
Classification
Before achieving classification, node features output by each DeepGCN layer are first connected to obtain the fused feature representation for each subject. Then, MLP is employed as a classifier to carry out the ultimate classification task between ASD and typical control (TC). In particular, the MLP comprises two 1 × 1 convolutional layers with 128 and Ck channels, respectively. Batch normalisation and dropout regularisation are applied to each convolutional layer of MLP. The final convolutional layer is followed by a softmax function to obtain a normalised probability vector belonging to Ck disease classes for each node. The output layer produces a two-dimensional feature representation for each node, given as , where each value respectively indicates the probability of the subject belonging to the TC class and the probability of the subject belonging to the ASD class.
The population graph consists of labelled and unlabelled data; therefore, we employ a semi-supervised learning strategy for classifying ASD patients and healthy subjects motivated by Kipf et al. in Ref. [17]. Semi-supervised learning lies between unsupervised learning and supervised learning, representing a special case of weak supervision. In the experiments of this study, the graph model is constructed for all collected data, with certain nodes labelled while the labels of others remain hidden. The loss function is computed using the labelled nodes, and the parameter updates are performed through the error backpropagation algorithm. During the training stage, the framework leverages information from both labelled and unlabelled nodes to gain more insights. In the testing stage, the unlabelled nodes are classified based on the softmax function.
EXPERIMENTAL SETTING
Dataset and preprocessing
The fMRI data utilised in this paper is obtained from the ABIDE I dataset [28]. The ABIDE I dataset is a publicly available multi-site data repository and is also widely used in other ASD diagnostic papers. For each subject in the dataset, preprocessed connectomes project has performed preprocessing on the fMRI data, and we can obtain the time series data for individual brain regions based on various atlases. For more detailed information, please refer to their official website: . In our study, for further analysing the fMRI data, the configurable pipeline for the analysis of connectomes (CPACs) [29] is utilised to preprocess rs-fMRI data for experiments. CPAC consists of seven main preprocessing steps: time slice correction, head motion correction, registration, intensity normalisation, interference signal regression, filtering, and spatial normalisation. CPAC is an effective tool for removing the missing data and suppressing noise in the data.
The ABIDE I dataset includes data from 1112 subjects collected in 17 international sites. This dataset is carefully checked by three experts through visual inspection to exclude subjects with incomplete brain coverage, significant head movement, and scanning distortions. As a result, a total of 871 subjects are selected from 1112 subjects as the experimental subjects of this paper. Table 1 summarises the demographic information used in this study.
TABLE 1 Demographic information of subjects (mean ± standard deviation).
Site | ASD | TC | ||
Age | Gender (number of males/females) | Age | Gender (number of males/females) | |
CALTECH | 24.0 ± 7.6 | 4/1 | 28.2 ± 12.2 | 6/4 |
CMU | 26.0 ± 5.4 | 4/2 | 27.8 ± 4.4 | 3/2 |
KKI | 10.7 ± 1.3 | 9/3 | 10.1 ± 1.2 | 15/6 |
LEUVEN | 17.0 ± 4.1 | 23/3 | 18.4 ± 5.0 | 26/4 |
MAX_MUN | 28.4 ± 13.2 | 16/3 | 25.2 ± 8.4 | 26/1 |
NYU | 14.8 ± 7.1 | 64/10 | 15.8 ± 6.2 | 72/26 |
OHSU | 11.4 ± 2.2 | 12/0 | 10.2 ± 1.0 | 13/0 |
OLIN | 17.1 ± 3.3 | 11/3 | 16.9 ± 3.6 | 12/2 |
PITT | 18.3 ± 7.0 | 21/3 | 18.7 ± 6.7 | 22/4 |
SBL | 34.0 ± 6.6 | 12/0 | 33.6 ± 6.8 | 14/0 |
SDSU | 15.3 ± 1.8 | 8/0 | 14.0 ± 1.9 | 13/6 |
STANFORD | 10.2 ± 1.6 | 9/3 | 9.8 ± 1.7 | 9/4 |
TRINITY | 17.0 ± 3.2 | 19/0 | 17.1 ± 3.8 | 25/0 |
UCLA | 13.1 ± 2.4 | 42/6 | 12.7 ± 2.1 | 32/25 |
UM | 12.9 ± 2.5 | 38/9 | 15.4 ± 3.4 | 55/18 |
USM | 23.6 ± 8.4 | 43/0 | 20.9 ± 8.3 | 24/0 |
YALE | 13.1 ± 3.0 | 14/8 | 13.6 ± 2.1 | 11/8 |
Total | 17.1 ± 8.0 | 349/54 | 16.8 ± 7.2 | 378/90 |
The parameters of model training
We assess the ASD diagnostic performance of VMM-DGCN by a 10-fold cross-validation on ABIDE I. Our model is implemented using the PyTorch open-source machine learning library, and it leverages the torch geometric library to perform Chebyshev spectral graph convolution operations [30]. The training details of the proposed model in our experiments are presented in Table 2. Furthermore, the model is trained using an end-to-end approach with the Adam optimiser [31] and the cross-entropy loss function, which is defined as follows:
TABLE 2 Model training details.
Name | Detail |
Operating system | Windows 11 |
RAM | 32G |
CPU | Intel(R) Core(TM) CPU I7 |
GPU | NVIDIA RTX A4000 |
Learning rate | 0.01 |
Order of Chebyshev polynomials | 3 |
Chebyshev convolution layers | 10 |
Epochs | 300 |
Weight decay | 5e−5 |
Dropout | 0.2 |
Edge dropout | 0.3 |
Hidden neuron number of DeepGCN1/DeepGCN2/DeepGCN3 | 32/128/64 |
Performance evaluation
In this paper, we utilise objective evaluation metrics, such as accuracy, sensitivity, specificity, and F1 score to quantify the performance of the proposed model. The accuracy represents the proportion of all correct classifications in the sample. The sensitivity, also known as recall rate, quantifies the accuracy in identifying positive samples, whereas the specificity quantifies the accuracy in identifying negative samples. The F1 score combines accuracy and recall, which offers a comprehensive assessment of the classifier's performance. It is worth noting that higher values for these metrics indicate superior classification performance. The definitions for these evaluation metrics are as follows:
EXPERIMENTS AND RESULTS
Comparison of different model variants
To evaluate the effectiveness of the proposed VMM-DGCN, we compare it with three model variants, and the architectures of these model variants are shown in Figure 4, which are (1) Model variant 1: based on the flattened features extracted by the low-level feature extraction module, the population graph is constructed directly using SVM-RFE screening features without CNN, and the final classification result is obtained by DeepGCN. (2) Model variant 2: the low-level features pass through CNNs of different scales to construct the multi-scale features of the corresponding subject, which are connected into a one-dimensional feature vector and establish the population graph. DeepGCN is then utilised for the final classification. (3) Model variant 3: according to the extracted multi-scale features by CNNs, the multi-scale subgraphs are constructed separately. Advanced features of each subgraph are then learnt through a 5-layer GCN. Subsequently, the advanced features of the corresponding node in each subgraph are connected to create a one-dimensional feature vector for each subject. Finally, the final classification result is obtained using another 5-layer GCN. (4) Our model: after constructing multi-scale subgraphs, three DeepGCN frameworks are used for deep feature extraction, and deeper features are connected for final classification. To make the model concise, the SVM-RFE process before constructing the population graph is omitted from the schematic diagrams of the above four models. The corresponding experimental results are summarised in Table 3 and illustrated in Figure 5.
[IMAGE OMITTED. SEE PDF]
TABLE 3 The classification results of different models.
Method | Accuracy | Sensitivity | Specificity | F1 score |
Model variant1 | 90.70% | 91.51% | 91.37% | 91.34% |
Model variant2 | 85.65% | 87.11% | 87.26% | 86.74% |
Model variant3 | 90.59% | 91.85% | 90.75% | 91.13% |
Our model | 91.62% | 92.14% | 92.49% | 92.24% |
[IMAGE OMITTED. SEE PDF]
According to the data presented in Table 3, our model outperforms the other three model variants in all objective metrics, which indicates that all components in the framework jointly contribute to VMM-DGCN's stable performance in all objective metrics. Model variant 2 achieves the worst performance because it uses the same graph convolution coefficients for all features, which ignores differences between features at different scales. In our model, multi-scale DeepGCN effectively improves the effect of ASD diagnostic by assigning different filters to the subgraphs.
To further demonstrate the effectiveness of VMM-DGCN, the receiver operating characteristic (ROC) curve and area under the curve (AUC) values are utilised to assess the performance of these models. ROC curve illustrates the model's ability to differentiate between ASD and TC by plotting the false positive rate (specificity) on the x-axis and the true positive rate (sensitivity) on the y-axis. It measures the model's effectiveness across different thresholds and intuitively represents the TPR-FPR relationship for classification. Moreover, a larger AUC value indicates a stronger model classification effect, while the proximity of the ROC curve towards the top-left corner signifies superior classification performance. Figure 5 displays the ROC curves, AUC values, and confusion matrix of VMM-DGCN and other variant models on ABIDE I dataset. Among them, the confusion matrix takes the prediction labels as the horizontal coordinates and the true labels as the vertical coordinates. It calculates the proportion of samples correctly classified and incorrectly classified by the classification model and presents the results in a matrix format. As depicted in Figure 5, VMM-DGCN obtains a higher AUC value than other model variants, and the classification accuracy in the confusion matrix is higher, which means that the proposed method has a stronger ability to identify ASD.
Comparison with other algorithms
To demonstrate the superiority of VMM-DGCN, we conduct a comprehensive comparison between VMM-DGCN and some existing methods, including the non-graph deep learning-based ASD diagnosis algorithm, the GCN-based ASD diagnosis algorithm, and the multimodal data-based ASD diagnosis algorithm. All these algorithms are trained and tested using the ABIDE I dataset, and the results in corresponding papers are adopted for comparison. Each algorithm is described in detail below.
-
Non-graph deep learning-based ASD diagnosis algorithm
SSAE proposed in Ref. [32] is an ASD diagnosis algorithm based on the semi-supervised autoencoder. SSAE uses the FC matrix of rs-fMRI as input and composes an unsupervised autoencoder and supervised classification network. It is trained to optimise both reconstruction error and classification loss simultaneously to achieve ASD diagnosis.
HDLFCA proposed in Ref. [33] is an ASD diagnosis algorithm based on a hybrid deep learning framework. HDLFCA builds the network inputs based on temporal dynamics of brain activity and connectivity of brain regions. In HDLFCA, convolutional recurrent neural networks and deep neural networks are combined for ASD diagnosis.
MC-NFE proposed in Ref. [34] is an ASD diagnosis algorithm based on nested feature extraction. MC-NFE first performs subcluster clustering in each category and then extracts informative multi-site FC features for ASD/TC classification through nested singular value decomposition strategies.
-
GCN-based ASD diagnosis algorithm
FC-HAT proposed in Ref. [35] is an ASD diagnosis algorithm based on the GCN hypergraph attention network. By incorporating a dynamic hypergraph generation stage and a hypergraph attention gathering stage, FC-HAT utilises an end-to-end approach to obtain the optimal hypergraph and node embedding for each brain network. Subsequently, a classification network is employed to effectively diagnose ASD.
MVS-GCN proposed in Ref. [36] is an ASD diagnosis algorithm based on multi-view GCN guided by prior knowledge of brain structure. MVS-GCN removes noise connections through graph structure learning to obtain multiple clean brain network views and then captures the intrinsic correlations between multiple views through a multi-task learning framework to learn graph representations for ASD diagnosis.
GraphCGC-Net proposed in Ref. [37] is an ASD diagnosis algorithm based on a comprehensive three-stage (i.e. graph clustering, generation, and classification) graph learning model. GraphCGC-Net uses a supervised multi-graph clustering approach to eliminate noisy connections and enhance critical connections. Subsequently, it constructs realistic brain-like networks by maintaining both global consistent distribution and local topological measures through the graph generation stage. Finally, GraphCGC-Net runs ASD diagnosis based on these brain networks.
-
Multimodal data-based ASD diagnosis algorithm
LG-GNN proposed in Ref. [38] is an ASD diagnosis algorithm based on local-global graph neural networks (GNNs). LG-GNN recognises biomarkers of brain regions well through local GNN. It also incorporates phenotypic information and subject associations into the classification structure through global GNN. Finally, the ASD diagnosis task is performed through local-to-global learning.
GP-GCN proposed in Ref. [39] is an ASD diagnosis algorithm based on individual-aware downsampling. GP-GCN employs unsupervised graph pooling to efficiently reduce the dimensionality of the brain's structural representation and obtains a sparse brain network, which can extract higher-level features well. Then, GP-GCN embeds each subject into the population graph and constructs edges based on phenotype information. Finally, GP-GCN employs two layers of GCN for ASD diagnosis.
MAGE proposed in Ref. [40] is an ASD automatic diagnostic algorithm based on multi-atlas and ensemble learning techniques. MAGE first utilises multiple atlases to extract diverse feature representations for each subject and then converts these feature representations into two-dimensional vectors containing positive and negative probabilities. Subsequently, MAGE employs a stacked ensemble learning approach to combine these two-dimensional vectors with non-imaging information and performs ASD diagnosis by using a classification network.
EV-GCN proposed in Ref. [41] is a population-based ASD diagnosis algorithm. EV-GCN uses multimodal data to automatically construct a population graph with variational edges. Also, this model is optimised together with spectral convolutional networks to achieve ASD diagnosis.
The experimental results, displayed in Table 4, clearly indicate that the proposed algorithm achieves the highest value in all the objective evaluation indices among the compared existing algorithms. First of all, VMM-DGCN demonstrates a remarkable improvement in accuracy and AUC when compared to non-graph deep learning algorithms. The accuracy is boosted by a range of 4.42%–23.2%, while the AUC shows a notable enhancement of 4.74%–26.43%. These results highlight the effectiveness of GCN in extracting meaningful features from graph-structured data. To highlight the distinct benefits of VMM-DGCN, we conduct a comprehensive comparison between VMM-DGCN and the most advanced GCN-based ASD diagnosis algorithms. Similarly, VMM-DGCN performs better than other GCN-based ASD diagnosis algorithms. For example, compared to GraphCGC-Net, VMM-DGCN achieves a 21.17% improvement in accuracy and a 22.98% improvement in AUC. Furthermore, VMM-DGCN demonstrates impressive average accuracy and AUC values of 91.62% and 95.74%, respectively, surpassing other ASD diagnosis algorithms based on multimodal data. In summary, the proposed algorithm can better extract deep embedding features of fMRI data and reasonably fuse multimodal data, thus achieving enhanced graph embedding learning and better ASD diagnosis performance.
TABLE 4 Results of comparison with existing algorithms.
Classes | Algorithms | Accuracy | Sensitivity | Specificity | AUC | F1 score |
Non-graph deep learning methods | SSAE | 87.20% | 89.90% | 80.30% | 91.00% | - |
HDLFCA | 72.40% | 74.20% | 70.50% | 79.20% | 73.20% | |
MC-NFE | 68.42% | 70.05% | 63.64% | 69.31% | - | |
GCN methods | FC-HAT | 70.90% | 70.00% | 72.30% | - | - |
MVS-GCN | 69.38% | 69.81% | 64.45% | 69.01% | - | |
GraphCGC-Net | 70.45% | 70.47% | - | 72.76% | 70.39% | |
Multimodal data methods | LG-GNN | 81.75% | 83.22% | 80.99% | 85.22% | 82.96% |
GP-GCN | 87.62% | 86.76% | 88.36% | 92.00% | - | |
MAGE | 75.86% | 79.24% | 71.53% | 83.14% | - | |
EV-GCN | 80.83% | - | - | 84.98% | 81.24% | |
Our method | VMM-DGCN | 91.62% | 92.14% | 92.49% | 95.74% | 92.24% |
Table 4 also exhibits that the ASD diagnosis algorithms based on non-graph deep learning methods also have strong performance. For example, the accuracy of SSAE reaches 87.20%. Furthermore, it is noteworthy that ASD diagnostic algorithms relying on single modal data graph representation learning typically achieve poorer recognition performance than the ASD diagnostic algorithms based on non-graph deep learning. This is primarily due to the complex structure of the brain network, which restricts the updates of features during the graph representation learning propagation process. Another reason is that the effectiveness of GCN is hindered by the need for a sufficiently large training dataset, which affects the learning of embeddings in GCN. In addition, compared to ASD diagnostic algorithms based on single modal data that only using imaging features, multimodal data can fully evaluate the correlation between subjects. This integration usually improves the model performance, which demonstrates the significance of integrating multimodal data.
Analysis of other model influencing factors
Analysis of different atlases
Brain network partitioning can be categorised into two main types: predefined structural segmentation atlases and functional atlases. The HO atlas, for instance, is a structural segmentation atlas created based on anatomical markers such as sulci and gyri. Craddock200 (CC200) atlas, which uses a spatially constrained spectral clustering algorithm to generate 200 functionally uniform regions, belongs to the functional atlas. To assess the influence of different brain atlases on the proposed model, comparative tests are conducted using the proposed algorithms for various atlas. Besides utilising the HO atlas to derive feature representations for each subject, we also use five other widely used brain atlases to extract FC features between paired brain regions. These additional brain atlases include the Eickhoff Zilles (EZ) atlas, AAL atlas, Talaraich and Tournoux (TT) atlas, Craddock 200 (CC200) atlas, and Dosenbach 160 atlas. For more detailed information on these five brain atlases, please refer to the following link: .
Figure 6 shows the ASD diagnostic performance of VMM-DGCN on the mentioned brain atlases. Notably, the proposed algorithm exhibits the highest performance across all evaluation metrics for the HO atlas. In particular, when compared to the EZ Atlas, the accuracy value of HO increases by 4.94%, sensitivity by 3.85%, specificity by 5.29%, AUC value by 4.52%, and F1 score by 4.76%. In addition, compared with CC200, Dosenbach160, TT, and AAL, the accuracy of HO is improved by 3.10%, 4.25%, 1.61%, and 3.44%, respectively. These results show that the pairwise correlations between the ROIs of the HO atlas by the proposed model contain more discriminative patterns compared to other atlases. As a result, the subsequent experiments in the paper are primarily conducted using the HO atlas.
[IMAGE OMITTED. SEE PDF]
Analysis of phenotype information
Due to different data acquisition devices, parameters, diagnostic protocols, and evaluation protocols across different sites, the ABIDE I dataset is highly heterogeneity. In population graph-based studies, non-imaging information plays a crucial role in influencing diagnostic performance by assigning higher weights to edges connecting nodes that have similar information. In this section, ablation studies are conducted on the ABIDE I dataset to assess the contribution of different phenotypic information on the diagnosis. The results of the experiments are summarised in Table 5.
TABLE 5 Effects of different combinations of phenotypic information.
Phenotypic information | Accuracy | Sensitivity | Specificity | AUC | F1 score |
Site | 87.26% | 88.04% | 88.56% | 91.14% | 88.09% |
Gender | 77.61% | 77.68% | 81.92% | 79.59% | 79.62% |
Age | 75.09% | 77.09% | 76.54% | 75.87% | 76.69% |
Site + gender | 91.62% | 92.14% | 92.49% | 95.74% | 92.24% |
Site + age | 86.57% | 87.58% | 87.75% | 89.41% | 87.46% |
Site + gender + age | 85.88% | 87.62% | 86.06% | 89.68% | 86.66% |
Site + gender + FIQ | 89.21% | 91.96% | 87.60% | 92.08% | 89.62% |
Site + gender + age + FIQ | 86.45% | 86.24% | 88.72% | 88.52% | 87.39% |
As shown in Table 5, single information, such as collection site, gender, and age achieves accuracy rates of 87.26%, 77.61%, and 75.09%, respectively. Notably, the accuracy obtained from site information is significantly higher than gender or age information alone, which indicates the high heterogeneity between different sites. The data and site information collected from various sites using different imaging protocols and scanners emerges as the most influential factor in enhancing ASD diagnostic performance. Furthermore, the inclusion of gender information in addition to site information improves the results of the ASD diagnosis, which suggests that gender information is helpful in ASD diagnostic tasks. However, the accuracy decreases when age information is added to the site information. Additionally, FIQ information does not contribute to improving the diagnostic performance. Therefore, in our experiments, site information and gender information are utilised as metrics and integrated into the graph representation to improve the accuracy of ASD diagnosis.
Analysis of the impact of different graph convolution operations on diagnostic performance
To assess the efficacy of the ChebConv operation in VMM-DGCN, it is feasible to replace the ChebConv with alternative graph convolution operators. Therefore, we conduct experimental analyses on VMM-DGCN by using various popular graph convolutions, including GCNConv [17], TAG-Conv [42], SGConv [43], and ARMAConv [44]. To form a fair comparison, all the experimental settings remain consistent, except for the graph convolution operation. The results in Table 6 highlight the superior performance of ChebConv in terms of all objective evaluation metrics, except for slightly lower specificity. This strongly indicates the effectiveness of the ChebConv operator.
TABLE 6 Performance comparison of different graph convolution operations in VMM-DGCN.
Operator | Accuracy | Sensitivity | Specificity | AUC | F1 score |
GCNConv | 87.83% | 88.87% | 89.01% | 93.39% | 88.74% |
SGConv | 84.73% | 88.46% | 82.80% | 88.77% | 85.33% |
TAGConv | 84.16% | 82.43% | 89.36% | 86.97% | 85.63% |
ARMAConv | 91.16% | 91.22% | 92.70% | 94.78% | 91.82% |
ChebConv | 91.62% | 92.14% | 92.49% | 95.74% | 92.24% |
ChebConv is a spatially balanced convolutional method that ensures convolutional kernels are applied in the same manner across all locations. It means that ChebConv can better capture local structures and perform effective graph learning while maintaining spatial balance. Hence, the ChebConv employed in this study demonstrates superior diagnostic performance across all metrics, with the exception of specificity. On the other hand, ARMAConv also exhibits good diagnostic performance and obtains the highest specificity value, possibly because ARMAConv utilises adaptive polynomial filters that can better adapt to different graph structures.
Analysis of the impact of the number of subgraphs and DropEdge strategy on diagnosis results
The impact of the number of subgraphs and the DropEdge strategy on ASD diagnostic performance is assessed. The number of subgraphs is varied from 1 to 5, and the DropEdge strategy is incorporated into each model. Accuracy, AUC, and F1 score are used as evaluation metrics, and a 10-fold cross-validation experiment is conducted. The average of these metrics cross-validated by 10-fold is finally taken for comparison. Figure 7 visually illustrates the influence of the number of subgraphs and the DropEdge strategy on the diagnosis of ASD.
[IMAGE OMITTED. SEE PDF]
From Figure 7, it can be observed that the accuracy of models with DropEdge initially improves and then decreases as the number of subgraphs increases. In contrast, models without DropEdge consistently show improvement in accuracy. A possible explanation for this phenomenon is that node features obtained from too many scales could introduce redundant information, which can negatively impact the performance of ASD diagnosis. When considering the same number of subgraphs, the models incorporating the DropEdge strategy outperform those without it. Specifically, the model with 3 subgraphs and DropEdge has the highest results. Its average accuracy, AUC, and F1 scores increase by approximately 3.56%, 4.19%, and 3.21% compared to the non-DropEdge model with 3 subgraphs, respectively.
Feature visualisation
To make the success of VMM-DGCN more intuitive, T-distributed stochastic neighbour embedding (TSNE) [45] is utilised to reduce FC features to two dimensions. In Figure 8, the original FC features and embedded features learnt through VMM-DGCN on the ABIDE I dataset are visualised. Figure 8a displays the original features obtained from the dataset. It can be observed that nodes of different types are closer to each other, overlapping, and randomly distributed without clear boundaries, which poses challenges for effective ASD diagnosis. Figure 8b presents the visualisation of the node embedding obtained by summing the outputs of the three DeepGCNs. Compared to the distribution of the original FC features, there is a noticeable increase in the gap between the ASD and TC groups, which indicating VMM-DGCN enhances the separability of the features. Therefore, VMM-DGCN demonstrates improved discrimination features between ASD and TC, and this outcome provides strong evidence of the effectiveness of VMM-DGCN in capturing these dissimilarities.
[IMAGE OMITTED. SEE PDF]
CONCLUSION
In this study, a deep GCN called VMM-DGCN is proposed for ASD diagnosis based on variable multi-graph and multimodal data. VMM-DGCN designs convolutional filters with varying kernel sizes to effectively capture the FC-based multi-scale feature representations for each subject. Then, based on a variable multi-graph construction strategy, the multi-scale feature representations and multimodal data are simultaneously incorporated into the ASD diagnosis framework to improve diagnostic performance. Experimental results on the ABIDE I dataset indicate that VMM-DGCN achieves superior performance compared to other models. Furthermore, extensive ablation experiments further validate the effectiveness of VMM-DGCN. However, due to the limitations of public datasets, we only utilise functional imaging modality without considering structural imaging modality related to brain disorders, which could be explored as a future research direction. Also, the proposed algorithm may have different diagnostic effects on people of different ages; it is meaningful for whether the proposed method is suitable for both children and adults. However, deep learning methods are commonly viewed as black boxes, which poses challenges in achieving physiological interpretations for disease diagnosis. Therefore, exploring biomarkers relevant to the disease and enhancing the interpretability of the model will be a key focus of our future research. Additionally, the proposed algorithm in this paper is a computer-aided diagnostic method for ASD diagnosis. Combining the proposed algorithm and the Doctor's professional knowledge, we can detect ASD early and accurately. Based on medical imaging to diagnose the condition but also according to the patient's history and signs, to provide a more personalised diagnosis and treatment plan.
ACKNOWLEDGEMENTS
This work was supported in part by National Natural Science Foundation of China under Grant 62172139, Natural Science Foundation of Hebei Province under Grant F2022201055, Project Funded by China Postdoctoral under Grant 2022M713361, Science Foundation Science Research Project of Hebei Province under Grant CXY2024031, Natural Science Interdisciplinary Research Program of Hebei University under Grant DXK202102, Open Project Program of the National Laboratory of Pattern Recognition (NLPR) under Grant 202200007. This work was also supported by the High-Performance Computing Center of Hebei University.
CONFLICT OF INTEREST STATEMENT
The authors declare no conflicts of interest.
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are openly available in ABIDE I dataset at [].
American Psychiatric Association: Diagnostic and Statistical Manual of Mental Disorders,
Kodak, B.T., Bergmann, B.S.: Autism spectrum disorder: characteristics, associated behaviors, and early intervention ‐ ScienceDirect. Pediatr. Clin. 67(3), 525–535 (2020). [DOI: https://dx.doi.org/10.1016/j.pcl.2020.02.007]
Wu, F., et al.: Simplifying graph convolutional networks. (2019). arXiv:1902.07153
Bianchi, F.M., et al.: Graph neural networks with convolutional ARMA filters. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3496–3507 (2022)
Laurens, V.D.M., Hinton, G.: Visualizing data using t‐SNE. J. Mach. Learn. Res. 9(2605), 2579–2605 (2008)
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Diagnosing individuals with autism spectrum disorder (ASD) accurately faces great challenges in clinical practice, primarily due to the data's high heterogeneity and limited sample size. To tackle this issue, the authors constructed a deep graph convolutional network (GCN) based on variable multi‐graph and multimodal data (VMM‐DGCN) for ASD diagnosis. Firstly, the functional connectivity matrix was constructed to extract primary features. Then, the authors constructed a variable multi‐graph construction strategy to capture the multi‐scale feature representations of each subject by utilising convolutional filters with varying kernel sizes. Furthermore, the authors brought the non‐imaging information into the feature representation at each scale and constructed multiple population graphs based on multimodal data by fully considering the correlation between subjects. After extracting the deeper features of population graphs using the deep GCN(DeepGCN), the authors fused the node features of multiple subgraphs to perform node classification tasks for typical control and ASD patients. The proposed algorithm was evaluated on the Autism Brain Imaging Data Exchange I (ABIDE I) dataset, achieving an accuracy of 91.62% and an area under the curve value of 95.74%. These results demonstrated its outstanding performance compared to other ASD diagnostic algorithms.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details

1 College of Electronic and Information Engineering, Hebei University, Baoding, China, Machine Vision Technology Innovation Center of Hebei Province, Baoding, China, The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
2 College of Electronic and Information Engineering, Hebei University, Baoding, China, Machine Vision Technology Innovation Center of Hebei Province, Baoding, China
3 The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
4 School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China