Full Text

Turn on search term navigation

Background

Resting-state functional MRI (rs-fMRI) measures the blood oxygen level-dependent (BOLD) signal in regions of interest (ROIs) throughout the entire brain of a subject while at rest (i.e., not during a cognitive task) [1–4]. The BOLD signal for each ROI is a time series whose low-frequency fluctuations are correlated with other ROIs. These correlations can be used to represent the functional connectivity (FC) between ROIs in a weighted brain network [5–9]. Differential changes, or rewiring of FC, between subjects with mood disorders and healthy controls may reveal neural mechanisms of disease. Feature selection and machine learning methods that use FC measures as predictors have potential as fMRI-based biomarkers and for disease classification [10]. A wide variety of machine learning (ML) algorithms, including support vector machines (SVMs), XGBoost, random forests, and deep learning have been used widely for rs-fMRI data to detect and better understand the mechanisms of mood disorders, including major depressive disorder (MDD) [11]. Various biomarkers and measures of FC between brain regions have been used as fMRI-based ML pred [1–4] ictors, including correlation, mutual information, amplitude of low frequency fluctuation (ALFF) and regional homogeneity (ReHo). ML classification of MDD with rs-fMRI has been promising, but using ML for diagnosis is likely premature [11]. Feature selection, which is the focus of the current study, while not directly diagnostic, may provide valuable insights into the biological mechanisms of MDD.

As the input variables for feature selection, we use preexisting ROIs based on brain atlases constructed by experts using anatomical and functional information. Feature engineering methods, such as independent component analysis (ICA) to define brain networks and ROIs [12], may capture additional variation but may be more difficult to interpret. Similarly, multivoxel pattern analysis (MVPA) [13] uses an SVM to build classifiers without assumptions about the organization of the brain, but the distributed collection of voxel associations may also be difficult to interpret and difficult to generalize between datasets.

We use Pearson’s correlation coefficients across the full time series between pairs of brain regions as the predictors for feature selection. Correlation is a convenient way to quantify FC and represent temporal synchrony of ROI activity. A common way to identify important ROIs from correlation FC is to perform a seed-based analysis, where the global correlation between a given seed region and all other brain regions is computed [14]; then, this centrality quantity can be tested for all ROIs for association with an outcome such as MDD. In the present study, we use centrality in a different way with our nearest-neighbor projected distance regression (NPDR) approach [15]. We apply NPDR to correlation-based predictors and then apply network theory to determine the cumulative effect of the differential correlations for each ROI. We also integrate this centrality approach with random forest (rf) in a new centrality rf (c-rf).

This machine learning feature selection study for correlation-based features is outlined as follows. We review the relevant ideas behind the penalized and nonpenalized NPDR feature selection method and describe the new distance metrics for compatibility with correlation-based features. We emphasize that our goal is to understand not only the importance of pairs of variables for predicting outcomes but also the importance of individual variables in the network. Next we describe a new simulation strategy for correlation-based data for classification tasks, and then we describe a real correlation-based dataset in the form of a previous rs-fMRI study of MDD. We compare the performance of the feature selection methods using the new simulation tool and the real rs-fMRI study of MDD, and we discuss the implications of some of the brain ROIs found to be associated with MDD.

Methods

Our goal is to identify important variables or pairs of variables that are important for predicting a given outcome variable. However, we assume that predictor information is only given in the form of pairwise correlation. The type of data we have in mind is correlation between brain regions in rs-fMRI studies. For a subject (Fig 1A), we average the voxel time series in each ROI of on a brain atlas (Fig 1B) and then compute the correlation between all pairs of n ROI time series (Fig 1C). For each subject, the upper triangle of the correlation matrices is stretched to create datasets, where the n(n-1)/2 predictor variables are pairwise correlations between ROIs (Fig 1D). Standard feature selection methods can then be used for these data to determine the importance of ROI pairs. In addition to individual ROIs, the current feature selection approach aims to disentangle the many important pairs to identify important individual ROIs. Nearest-neighbor projected-distance regression (NPDR) is a machine learning feature selection algorithm that is able to detect statistical interactions using nearest neighbors in a high-dimensional space [15]. Before describing the centrality-based and correlation-distance extensions of NPDR, we review some of the relevant aspects of NPDR for ranking the importance of variables for predicting a class variable. NPDR minimizes the contrastive loss function for pairs of samples . The contrastive loss is an indicator of whether sample pair is in the same class or a different class based on class variable y. The contrastive loss can be penalized by LASSO or Ridge, or it can be unpenalized and P-values can be computed. Rather than using the predictor/attribute values directly in the regression, NPDR uses the difference (or projected distance onto the attribute matrix X) between subjects . The vector denotes the projected distance for all attributes in the set X. In the current application, the attributes are Pearson correlations between pairs of ROIs. For centrality NPDR (c-NPDR), the projected distance or diff, , is the absolute difference between subjects for one correlation attribute p (correlation between a pair of ROIs):.

(1)

[Figure omitted. See PDF.]

Regions of interest (ROIs) are composed of groups of voxels within the brain. Three ROIs (A) are used for illustration (green, blue, and red cubes/voxels), but the number of ROIs is typically on the order of 200. Each voxel has an associated time series, which is averaged within ROIs to create the green, red and blue time series (B). From these time series, pairwise ROI correlations are calculated and stored in a matrix for each subject (C). The upper triangle of each subject’s correlation matrix can be stretched into a sample vector, s_i, to form rows of a dataset (D), where the predictors (columns) are ROI-ROI correlations.

where is the correlation for subject i between a pair of ROIs, represented by p. Thus, if there are n ROIs, the NPDR design matrix consists of n(n-1)/2 attribute columns; one for each pair of ROIs. The NPDR-selected ROI pairs can then be used in any number of centrality algorithms to rank the importance of individual ROIs. For comparison, we use the following centralities: degree, betweenness, eigenvector and Integrated Value of Influence (IVI) [16].

The other NPDR-based method (correlation-diff-NPDR) for ranking the importance of ROIs from ROI-pair correlation data uses a more complex projected distance, , but directly gives importance of individual ROIs without centrality calculations [17]. The correlation-diff (CD) or correlation projected distance for ROI r is given by

(2)

where is the correlation between ROIs r and k for subject i. Thus, the correlation-diff for ROI r is the absolute sum of the differences between r and all other ROIs. If there are n ROIs, the NPDR design matrix for Eq. (2) will have n columns as opposed to n(n-1)/2 for Eq. (1). Thus, NPDR with Eq. (2) yields importance scores for ROIs, while NPDR with Eq. (1) yields the importance of ROI pairs. In both cases (Eqs. 1 and 2), NPDR importance can be computed in terms of individual P-values, which we adjust for false discoveries, or in a multivariate model with LASSO or Ridge regression.

To threshold the results of correlation-diff-NPDR and cNPDR, we use regularization and FDR adjusted P-values. We use the LASSO penalty, also known as the L1 penalty, which is a regularization technique used in regression models to prevent overfitting and to enhance the model’s prediction accuracy and interpretability. For non-penalized methods, we use a P-value cutoff adjusted for multiple testing, where ROI pairs that had an adjusted P-value > 0.05 are removed from the network. For random forest importance, we use a cutoff of the top 200 pairs of ROIs (Fig 2) because there is not a clear statistical threshold.

[Figure omitted. See PDF.]

On the left, correlation-diff-NPDR (Eq. 2) can directly rank the importance of ROIs using P-values or penalized regression coefficients. On the right, centrality-NPDR (C-NPDR, Eq. 1) and centrality random forest (c-rf) rank the importance of pairs of ROIs, and then the centralities of the resulting ROI-ROI networks are used to rank the importance of individual ROIs.

Correlation-diff-NPDR directly yields a list of significant individual ROIs. However, centrality methods need an additional step to map pair importance to individual importance. The c-NPDR and c-rf methods yield lists of important pairs of ROIs, so we apply centralities to the resulting edge lists to obtain a list of important individual ROIs (Fig 2). The significant pairs of ROIs are graphed as a network, where the nodes are ROIs and edges are defined when the ROI pairs have a correlation that affects the outcome variable (e.g., MDD). This interaction network is a way to visualize the importance of MDD nodes based on their connections and visualize local structure. We quantify the importance of individual ROIs using common centralities, degree, eigenvector, betweenness, and IVI [16]. IVI combines multiple centrality measures.

We also implement a centrality random forest (c-rf), which applies the centrality approach to random forest importance of correlation pairs, and we compare c-rf with c-NPDR. In other words, we use the correlation predictor data (Fig 1d) to compute random forest permutation importance with 5000 trees, filter the correlation pairs to the top 200 to create a network, and then compute ROI centralities.

Simulation method and real data

Simulation approach.

We develop a random network approach to simulate correlation-based features, a fraction of which are functional or associated with case–control status (Fig 3). The application we have in mind is correlation between brain ROIs in resting-state fMRI studies, where correlation is calculated from the BOLD signal time-series. We do not simulate the time series but rather directly simulate the correlations and their differences between groups. Features or predictors are correlations between pairs of ROIs rather than ROIs themselves. We note that these simulations and feature selection methods can also be applied to other types of correlation-based data in other research domains.

[Figure omitted. See PDF.]

A random network is generated (Erdos-Renyi in the example) between the number of regions of interest or ROIs (10 circles in the brain). For each sample, random correlations are mapped to connected regions of the network, and lower random correlations are mapped to unconnected regions. Pairs of regions are selected to be functional (associated with the outcome variable) and are indicated by green edges in the brain network and black dots in the heatmaps. Each heatmap represents a different sample. For the cases (left heatmaps), the selected functional pairs are perturbed to have a higher correlation, and for the controls (right heatmaps), the functional pairs are perturbed to a lower correlation. The final heatmaps represent case‒control datasets with correlation-based features containing noise and functional ROI pairs.

The simulation includes parameters that control the number of ROIs, the number of cases and controls, the number of functional ROIs (i.e., those associated with the outcome), the effect size and the type of underlying random network for the brain. In the current study, we specify an Erdos-Renyi network, but the simulation can generate any network from the igraph library. Furthermore, the simulation software allows a user to input their own network; for example, based on real correlation data. Initial correlation matrices are generated for each sample based on the input network, where connected ROIs have higher random correlations than unconnected ROIs.

Functional nodes are chosen from the largest connected component (i.e., the group of nodes such that there is a path between any pair of nodes in the group). Edges between the functional nodes (green edges in Fig 3) are subsequently used to create differential correlations between the cases and controls (black dots in Fig 3 heatmaps). We use a parameter called “multiway” that controls how many edges we randomly select to generate differential correlations. For example, a multiway of 2 will use only a subset of the possible edges between functional nodes (a subset of possible edges will be green). If we set multiway to the maximum, then all possible edges between functional nodes will have differential correlations (green). We use multiway = 5 in this application. We generate replicate simulations to compare feature selection methods based on the ability to detect simulated ground truth functional ROIs. We use the F1 score to test whether the top ROI features selected by a method overlap with the top functional features.

Real data.

We compare feature selection methods on data from the Tulsa 1000 (T1000), a longitudinal study at the Laureate Institute for Brain Research following 1000 individuals, including healthy individuals and those with mood and other disorders [18]. We use rs-fMRI time series for 188 MDD subjects and 47 healthy controls (HCs) from T1000 (163 females and 72 males). Cardiac- and respiration-induced noise reduction RETROICOR preprocessing were applied to the time series along with despiking and regressing out low-frequency, 12-motion parameters, local white matter average signal (ANATICOR). Subjects with RMS motion larger than 0.2 were excluded from the analysis.

We use the Automated Anatomical Labelling (AAL) Atlas with 87 ROIs and the Brainnetome Atlas with 246 ROIs to define consistent and interpretable mappings for selected features [19,20]. The Brainnetome Atlas parcellates the brain based on structural and connectivity features. Neuroimaging data, particularly rs-fMRI and diffusion tensor imaging (DTI) data, are used to reveal both functional and structural connectivity patterns in the brain. For each atlas, we detrended the signals and averaged the time series for the voxels within an atlas ROI.

Results

We simulate 50 replicate datasets each with 100 cases, 100 controls, and 100 ROIs. We select 10 functional ROIs, but their effects are detected through their correlations with each other in correlation-predictor datasets. The underlying correlation networks are based on Erdos-Renyi random networks with connection probability p = 0.1. We use a medium effect size of 0.5 Cohen’s d.

We apply six feature selection methods to the replicate simulations (Fig 4) and compare them based on their average ability to detect the 10 functional ROIs. Centrality-random forest (c-rf) with degree centrality (red, Fig 4) has a similar mean F1 score to correlation-diff NPDR (corr-diff) using Ridge regression (green CD Ridge, Fig 4), and they are both similar to centrality-NPDR (c-npdr) with degree centrality (left blue). The npdr-based methods, namely, corr-diff and c-npdr with degree, exhibit slightly less variation than does the c-rf method. The F1 scores for centrality-based NPDR methods (all blue, Fig 4) depend on the centrality method used. C-NPDR works best with degree, whereas IVI, betweenness, and eigenvector centralities are noticeably worse. The close similarity between c-npdr (Eq. 1) with degree and npdr with corr-diff (Eq. 2) suggests that the corr-diff metric (Eq. 2) is mathematically related to degree centrality.

[Figure omitted. See PDF.]

The F1 score is used to quantify whether the top 10 ROIs ranked by a given method are enriched for the top 10 simulated functional ROIs. Each violin plot represents 50 replicate simulations created using random network theory (Fig 3). Colors (indicated in legend) represent three main types of ROI feature importance: degree centrality with random forest (c-rf, red), NPDR using the Eq. 2 metric and Ridge regression (corr-diff, green), and NPDR using the Eq. 1 metric followed by centrality (c-ndpr, blue). For the c-npdr methods (blue), we use four centralities: c = degree, IVI, betweenness and eigenvector. The corr-diff NPDR method (green) directly ranks ROIs as opposed to using centrality from ROI-pair scores, and its performance is similar to centrality random forest (red c-rf, c = degree) and centrality npdr with degree (blue c-npdr, c = degree). The centrality-NPDR methods that do not use degree (three blue plots on the right) perform significantly worse (P < 0.05) than corr-diff NPDR, degree c-rf, and degree NPDR (three plots on the left).

We apply four feature selection methods to the real rs-fMRI data to compare the selected important ROIs for MDD (Table 1). Because we do not have ground truth true positive ROIs for MDD, we compare the properties and selected ROIs between methods. The correlation-diff NPDR model with LASSO selects the fewest features because it has a tendency to eliminate correlated features (first column, Table 1). The other correlation-diff NPDR analysis (second column, Table 1) uses an FDR adjusted P-value cutoff rather than LASSO, which results in more selected features in part due to the inclusion of more correlated features. The centrality methods that use degree – NPDR (column 3) and random forest (column four) – use a manual threshold because degree centrality does not have a statistical threshold. Although the focus of this study is feature selection, the random forest out-of-bag classification accuracy using all correlation features is 79.1%.

[Figure omitted. See PDF.]

For the Brainnetome atlas (top, Table 1), we select the most parsimonious list of ROIs according to NPDR correlation-diff LASSO (column 1), and these features are included in the longer lists of NPDR methods (columns 2 and 3). The random forest method has a slightly different set of selected ROIs because it is not distance based and tends to find more main effects than interactions compared to NPDR methods. We highlight the selected ROIs involving the MTG (middle temporal gyrus) because it is the top ROI found by NPDR correlation-diff (Table 1). MTG has been associated with MDD in previous studies [21,22]. The highest scoring ROIs for the BNA atlas (top, Table 1) correspond to hubs in the NPDR network of ROI pair scores (Fig 5A). The three hubs are inferior temporal gyrus (ITG, Fig 5B), MTG (Fig 5C), and parahippocampal gyrus (PhG, Fig 5D). Each of these hubs are in separate graph clusters, determined by the Louvain method.

[Figure omitted. See PDF.]

The nodes are sized by degree and colored according to Louvain network clustering. Three areas of the overall network (A) are highlighted around the three most important ROIs for MDD according to NPDR (see top of Table 1, column 1). These main ROIs are (B) ITG (inferior temporal gyrus), (C) MTG (middle temporal gyrus), and (D) PhG (parahippocampal gyrus). They are part of separate Louvain clusters (node colors).

For the AAL atlas (bottom section, Table 1), the three NPDR methods yield a consensus of ROIs selected by the LASSO method. These regions include the dorsal and ventral default mode networks (DMNs) and the right executive control network (ECN). The random forest centrality method includes multiple blocks of correlated variables (bottom, Table 1) in regions such as the anterior salience, auditory, and dorsal DMN. The NPDR methods have reduced multicolinearity compared to the random forest based method, and LASSO NPDR automatically selects a parsimonious set of ROIs.

The reduced multicollinearity in LASSO NPDR selected features can also be seen in the Brainnetome atlas (top, Table 1), where the LASSO NPDR rank list includes MTG_R_4_2 while non-LASSO NPDR and random forest also include the left hemisphere ROI, MTG_L_4_2. The variance inflation factor (VIF) for the correlation-based variable MTG_L_4_2–SPL_L_5_4 is greater than 5, suggesting the left MTG_4_2 ROI maybe involved in collinearity with other ROIs. Hemisphere symmetry in functional connectivity may lead to collinearity, and stronger connectivity in one hemisphere may lead to hemisphere-specific collinearity as in the case of MTG_L.

Analyses were performed using the Brainnetome (BNA) Atlas (top table) and the AAL atlas (bottom table). The feature selection methods used include correlation-diff NPDR with the LASSO penalty, correlation-diff-NPDR with the FDR adjusted P-value, centrality NPDR with adjusted P-value and degree, and centrality random forest with degree (top values chosen to match the length of the other methods). ROIs involving the MTG (middle temporal gyrus) are highlighted across methods (columns in top section).

Discussion

The middle temporal gyrus (MTG) was found by all feature selection methods to be important for predicting MDD (Table 1). The centrality random forest method identified only the left MTG, while the LASSO correlation-diff NPDR method identified only the right MTG. The other NPDR methods, including the centrality-based method, identified both the left and right MTG. MTG is critical for semantic memory processing, visual perception, and language processing [23], and studies have shown associations with MDD. For example, studies using structural and functional MRI have identified significant gray matter abnormalities in the right MTG in participants with treatment-resistant depression (TRD) and treatment-responsive depression (TSD) compared to healthy controls [21]. The reduced gray matter volume in the bilateral MTG is indicative of structural changes associated with MDD. Similarly, a previous study showed that the fractional amplitude of low-frequency fluctuation (fALFF) in the right and left MTG was greater in participants with MDD than in HCs [22].

All of the tested methods showed that the superior temporal gyrus (STG) and inferior temporal gyrus (ITG) were important for predicting MDD. In participants with anxious depression versus healthy controls, a previous study found increased fALFF values in the left STG [24]. Although not previously linked to MDD, the ITG showed gray matter volume reductions in the MTG and ITG in chronic schizophrenia participants [25]. The link to another psychiatric condition could indicate a broader role for ITG in mood disorders.

Conclusions

The application of machine learning and feature selection algorithms to fMRI data is increasingly critical for understanding the biological mechanisms of disorders. New methods are needed that can account for interactions between variables and regions of interest. We extended NPDR, with its ability to detect interactions, to handle data where the predictors are correlations between pairs of ROIs. We applied these NPDR centrality methods and a random forest centrality approach to correlation predictor data from a real rs-fMRI dataset for MDD, and the consensus between these methods found MTG, ITG, and STG to be important MDD ROIs.

We also developed a new simulation approach to compare correlation-based feature selection methods. We found that centrality NPDR scores with degree (i.e., Eq. 1 c-npdr with c = degree) are similar to scores based on correlation-diff NPDR (Eq. 2). This suggests a mathematical connection between the correlation-diff metric (Eq. 2) and degree centrality. The correlation-diff metric also might be improved by incorporating properties of other centralities. Degree centrality with NPDR gave better simulation performance than betweenness or eigenvector centrality. However, this difference could be due to the nature of the random networks used in the data simulation. Different network types, such as Watts-Storgatz small world networks, may simulate data where betweenness plays a more important. Future work will explore the effect network simulation parameters on feature selection performance.

References

1. 1. van den Heuvel MP, Hulshoff Pol HE. Exploring the brain network: a review on resting-state fMRI functional connectivity. Eur Neuropsychopharmacol. 2010;20(8):519–34. pmid:20471808

* View Article

* PubMed/NCBI

* Google Scholar

2. 2. Salvador R, Suckling J, Coleman MR, Pickard JD, Menon D, Bullmore E. Neurophysiological architecture of functional magnetic resonance images of human brain. Cereb Cortex. 2005;15(9):1332–42. pmid:15635061

* View Article

* PubMed/NCBI

* Google Scholar

3. 3. Iraji A, Calhoun VD, Wiseman NM, Davoodi-Bojd E, Avanaki MRN, Haacke EM, et al. The connectivity domain: Analyzing resting state fMRI data using feature-based data-driven and model-based methods. Neuroimage. 2016;134:494–507. pmid:27079528

* View Article

* PubMed/NCBI

* Google Scholar

4. 4. Preti MG, Bolton TA, Van De Ville D. The dynamic functional connectome: State-of-the-art and perspectives. Neuroimage. 2017;160:41–54. pmid:28034766

* View Article

* PubMed/NCBI

* Google Scholar

5. 5. Biswal BB, Mennes M, Zuo X-N, Gohel S, Kelly C, Smith SM, et al. Toward discovery science of human brain function. Proc Natl Acad Sci U S A. 2010;107(10):4734–9. pmid:20176931

* View Article

* PubMed/NCBI

* Google Scholar

6. 6. Smith SM, Miller KL, Salimi-Khorshidi G, Webster M, Beckmann CF, Nichols TE, et al. Network modelling methods for FMRI. Neuroimage. 2011;54(2):875–91. pmid:20817103

* View Article

* PubMed/NCBI

* Google Scholar

7. 7. Van Dijk KRA, Sabuncu MR, Buckner RL. The influence of head motion on intrinsic functional connectivity MRI. Neuroimage. 2012;59(1):431–8. pmid:21810475

* View Article

* PubMed/NCBI

* Google Scholar

8. 8. Zuo X-N, Xing X-X. Test-retest reliabilities of resting-state FMRI measurements in human brain functional connectomics: a systems neuroscience perspective. Neurosci Biobehav Rev. 2014;45:100–18. pmid:24875392

* View Article

* PubMed/NCBI

* Google Scholar

9. 9. Fornito A, Zalesky A, Breakspear M. Graph analysis of the human connectome: promise, progress, and pitfalls. Neuroimage. 2013;80:426–44. pmid:23643999

* View Article

* PubMed/NCBI

* Google Scholar

10. 10. Du Y, Fu Z, Calhoun VD. Classification and prediction of brain disorders using functional connectivity: promising but challenging. Front Neurosci. 2018;12:525. pmid:30127711

* View Article

* PubMed/NCBI

* Google Scholar

11. 11. Chen Y, Zhao W, Yi S, Liu J. The diagnostic performance of machine learning based on resting-state functional magnetic resonance imaging data for major depressive disorders: a systematic review and meta-analysis. Front Neurosci. 2023;17:1174080. pmid:37811326

* View Article

* PubMed/NCBI

* Google Scholar

12. 12. Calhoun VD, de Lacy N. Ten key observations on the analysis of resting-state functional MR imaging data using independent component analysis. Neuroimaging Clin N Am. 2017;27(4):561–79. pmid:28985929

* View Article

* PubMed/NCBI

* Google Scholar

13. 13. Al-Zubaidi A, Mertins A, Heldmann M, Jauch-Chara K, Münte TF. Machine learning based classification of resting-state fMRI features exemplified by metabolic state (Hunger/Satiety). Front Hum Neurosci. 2019;13164. pmid:31191274

* View Article

* PubMed/NCBI

* Google Scholar

14. 14. Joel SE, Caffo BS, van Zijl PCM, Pekar JJ. On the relationship between seed-based and ICA-based measures of functional connectivity. Magn Reson Med. 2011;66(3):644–57. pmid:21394769

* View Article

* PubMed/NCBI

* Google Scholar

15. 15. Le TT, Dawkins BA, McKinney BA. Nearest-neighbor Projected-Distance Regression (NPDR) for detecting network interactions with adjustments for multiple tests and confounding. Bioinformatics. 2020;36(9):2770–7. pmid:31930389

* View Article

* PubMed/NCBI

* Google Scholar

16. 16. Salavaty A, Ramialison M, Currie PD. Integrated Value of Influence: An Integrative Method for the Identification of the Most Influential Nodes within Networks. Patterns (N Y). 2020;1(5):100052. pmid:33205118

* View Article

* PubMed/NCBI

* Google Scholar

17. 17. Dawkins BA, Le TT, McKinney BA. Theoretical properties of distance distributions and novel metrics for nearest-neighbor feature selection. PLoS One. 2021;16(2):e0246761. pmid:33556091

* View Article

* PubMed/NCBI

* Google Scholar

18. 18. Victor TA, Khalsa SS, Simmons WK, Feinstein JS, Savitz J, Aupperle RL, et al. Tulsa 1000: a naturalistic study protocol for multilevel assessment and outcome prediction in a large psychiatric sample. BMJ Open. 2018;8(1):e016620. pmid:29371263

* View Article

* PubMed/NCBI

* Google Scholar

19. 19. Rolls ET, Huang C-C, Lin C-P, Feng J, Joliot M. Automated anatomical labelling atlas 3. Neuroimage. 2020;206116189. pmid:31521825

* View Article

* PubMed/NCBI

* Google Scholar

20. 20. Fan L, Li H, Zhuo J, Zhang Y, Wang J, Chen L, et al. The human brainnetome atlas: a new brain atlas based on connectional architecture. Cereb Cortex. 2016;26(8):3508–26. pmid:27230218

* View Article

* PubMed/NCBI

* Google Scholar

21. 21. Ma C, Ding J, Li J, Guo W, Long Z, Liu F, et al. Resting-state functional connectivity bias of middle temporal gyrus and caudate with altered gray matter volume in major depression. PLoS One. 2012;7(9):e45263. pmid:23028892

* View Article

* PubMed/NCBI

* Google Scholar

22. 22. Zhang Q, Li X, Yan H, Wang Y, Ou Y, Yu Y, et al. Associations between abnormal spontaneous neural activity and clinical variables, eye movements, and event-related potential indicators in major depressive disorder. Front Neurosci. 2023;16.

* View Article

* Google Scholar

23. 23. Loh D. “Middle temporal gyrus | Radiology Reference Article | Radiopaedia.org,”. Radiopaedia. [cited Feb 05 2024. ]. [Online]. Available from: https://radiopaedia.org/articles/middle-temporal-gyrus

24. 24. Zhao P, Wang X, Wang Q, Yan R, Chattun MR, Yao Z, et al. Altered fractional amplitude of low-frequency fluctuations in the superior temporal gyrus: a resting-state fMRI study in anxious depression. BMC Psychiatry. 2023;23(1):847. pmid:37974113

* View Article

* PubMed/NCBI

* Google Scholar

25. 25. Onitsuka T, Shenton ME, Salisbury DF, Dickey CC, Kasai K, Toner SK, et al. Middle and inferior temporal gyrus gray matter volume abnormalities in chronic schizophrenia: an MRI study. Am J Psychiatry. 2004;161(9):1603–11. pmid:15337650

* View Article

* PubMed/NCBI

* Google Scholar

Citation: Kresock E, Dawkins B, Luttbeg H, Li Y(, Kuplicki R, McKinney BA (2025) Centrality nearest-neighbor projected-distance regression (C-NPDR) feature selection for correlation-based predictors with application to resting-state fMRI study of major depressive disorder. PLoS ONE 20(3): e0319346. https://doi.org/10.1371/journal.pone.0319346

About the Authors:

Elizabeth Kresock

Roles: Methodology, Software, Writing – original draft, Writing – review & editing

Affiliation: Tandy School of Computer Science, The University of Tulsa, Tulsa, Oklahoma, United States of America

Bryan Dawkins

Roles: Methodology, Software

Affiliation: SomaLogic, Inc., Boulder, Colorado United States of America

Henry Luttbeg

Roles: Software

Affiliation: Department of Mathematics, The University of Tulsa, Tulsa, Oklahoma, United States of America

Yijie (Jamie) Li

Roles: Investigation

Affiliation: Tandy School of Computer Science, The University of Tulsa, Tulsa, Oklahoma, United States of America

Rayus Kuplicki

Roles: Supervision

Affiliation: Laureate Institute for Brain Research, Tulsa, Oklahoma, United States of America

B. A. McKinney

Roles: Conceptualization, Investigation, Project administration, Supervision, Writing – review & editing

E-mail: [email protected]

Affiliations: Tandy School of Computer Science, The University of Tulsa, Tulsa, Oklahoma, United States of America, Department of Mathematics, The University of Tulsa, Tulsa, Oklahoma, United States of America

ORICD: https://orcid.org/0000-0002-9494-8833

[/RAW_REF_TEXT]

References

1. van den Heuvel MP, Hulshoff Pol HE. Exploring the brain network: a review on resting-state fMRI functional connectivity. Eur Neuropsychopharmacol. 2010;20(8):519–34. pmid:20471808

2. Salvador R, Suckling J, Coleman MR, Pickard JD, Menon D, Bullmore E. Neurophysiological architecture of functional magnetic resonance images of human brain. Cereb Cortex. 2005;15(9):1332–42. pmid:15635061

3. Iraji A, Calhoun VD, Wiseman NM, Davoodi-Bojd E, Avanaki MRN, Haacke EM, et al. The connectivity domain: Analyzing resting state fMRI data using feature-based data-driven and model-based methods. Neuroimage. 2016;134:494–507. pmid:27079528

4. Preti MG, Bolton TA, Van De Ville D. The dynamic functional connectome: State-of-the-art and perspectives. Neuroimage. 2017;160:41–54. pmid:28034766

5. Biswal BB, Mennes M, Zuo X-N, Gohel S, Kelly C, Smith SM, et al. Toward discovery science of human brain function. Proc Natl Acad Sci U S A. 2010;107(10):4734–9. pmid:20176931

6. Smith SM, Miller KL, Salimi-Khorshidi G, Webster M, Beckmann CF, Nichols TE, et al. Network modelling methods for FMRI. Neuroimage. 2011;54(2):875–91. pmid:20817103

7. Van Dijk KRA, Sabuncu MR, Buckner RL. The influence of head motion on intrinsic functional connectivity MRI. Neuroimage. 2012;59(1):431–8. pmid:21810475

8. Zuo X-N, Xing X-X. Test-retest reliabilities of resting-state FMRI measurements in human brain functional connectomics: a systems neuroscience perspective. Neurosci Biobehav Rev. 2014;45:100–18. pmid:24875392

9. Fornito A, Zalesky A, Breakspear M. Graph analysis of the human connectome: promise, progress, and pitfalls. Neuroimage. 2013;80:426–44. pmid:23643999

10. Du Y, Fu Z, Calhoun VD. Classification and prediction of brain disorders using functional connectivity: promising but challenging. Front Neurosci. 2018;12:525. pmid:30127711

11. Chen Y, Zhao W, Yi S, Liu J. The diagnostic performance of machine learning based on resting-state functional magnetic resonance imaging data for major depressive disorders: a systematic review and meta-analysis. Front Neurosci. 2023;17:1174080. pmid:37811326

12. Calhoun VD, de Lacy N. Ten key observations on the analysis of resting-state functional MR imaging data using independent component analysis. Neuroimaging Clin N Am. 2017;27(4):561–79. pmid:28985929

13. Al-Zubaidi A, Mertins A, Heldmann M, Jauch-Chara K, Münte TF. Machine learning based classification of resting-state fMRI features exemplified by metabolic state (Hunger/Satiety). Front Hum Neurosci. 2019;13164. pmid:31191274

14. Joel SE, Caffo BS, van Zijl PCM, Pekar JJ. On the relationship between seed-based and ICA-based measures of functional connectivity. Magn Reson Med. 2011;66(3):644–57. pmid:21394769

15. Le TT, Dawkins BA, McKinney BA. Nearest-neighbor Projected-Distance Regression (NPDR) for detecting network interactions with adjustments for multiple tests and confounding. Bioinformatics. 2020;36(9):2770–7. pmid:31930389

16. Salavaty A, Ramialison M, Currie PD. Integrated Value of Influence: An Integrative Method for the Identification of the Most Influential Nodes within Networks. Patterns (N Y). 2020;1(5):100052. pmid:33205118

17. Dawkins BA, Le TT, McKinney BA. Theoretical properties of distance distributions and novel metrics for nearest-neighbor feature selection. PLoS One. 2021;16(2):e0246761. pmid:33556091

18. Victor TA, Khalsa SS, Simmons WK, Feinstein JS, Savitz J, Aupperle RL, et al. Tulsa 1000: a naturalistic study protocol for multilevel assessment and outcome prediction in a large psychiatric sample. BMJ Open. 2018;8(1):e016620. pmid:29371263

19. Rolls ET, Huang C-C, Lin C-P, Feng J, Joliot M. Automated anatomical labelling atlas 3. Neuroimage. 2020;206116189. pmid:31521825

20. Fan L, Li H, Zhuo J, Zhang Y, Wang J, Chen L, et al. The human brainnetome atlas: a new brain atlas based on connectional architecture. Cereb Cortex. 2016;26(8):3508–26. pmid:27230218

21. Ma C, Ding J, Li J, Guo W, Long Z, Liu F, et al. Resting-state functional connectivity bias of middle temporal gyrus and caudate with altered gray matter volume in major depression. PLoS One. 2012;7(9):e45263. pmid:23028892

22. Zhang Q, Li X, Yan H, Wang Y, Ou Y, Yu Y, et al. Associations between abnormal spontaneous neural activity and clinical variables, eye movements, and event-related potential indicators in major depressive disorder. Front Neurosci. 2023;16.

23. Loh D. “Middle temporal gyrus | Radiology Reference Article | Radiopaedia.org,”. Radiopaedia. [cited Feb 05 2024. ]. [Online]. Available from: https://radiopaedia.org/articles/middle-temporal-gyrus

24. Zhao P, Wang X, Wang Q, Yan R, Chattun MR, Yao Z, et al. Altered fractional amplitude of low-frequency fluctuations in the superior temporal gyrus: a resting-state fMRI study in anxious depression. BMC Psychiatry. 2023;23(1):847. pmid:37974113

25. Onitsuka T, Shenton ME, Salisbury DF, Dickey CC, Kasai K, Toner SK, et al. Middle and inferior temporal gyrus gray matter volume abnormalities in chronic schizophrenia: an MRI study. Am J Psychiatry. 2004;161(9):1603–11. pmid:15337650

Word count: 5451

Show less

© 2025 Kresock et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Background

Nearest-neighbor projected-distance regression (NPDR) is a metric-based machine learning feature selection algorithm that uses distances between samples and projected differences between variables to identify variables or features that may interact to affect the prediction of complex outcomes. Typical tabular bioinformatics data consist of separate variables of interest, such as genes or proteins. In contrast, resting-state functional MRI (rs-fMRI) data are composed of time-series for brain regions of interest (ROIs) for each subject, and these within-brain time-series are typically transformed into correlations between pairs of ROIs. These pairs of variables of interest can then be used as inputs for feature selection or other machine learning methods. Straightforward feature selection would return the most significant pairs of ROIs; however, it would also be beneficial to know the importance of individual ROIs.

Results

We extend NPDR to compute the importance of individual ROIs from correlation-based features. We introduce correlation-difference and centrality-based versions of NPDR. Centrality-based NPDR can be coupled with any centrality method and can be coupled with importance scores other than NPDR, such as random forest importance scores. We develop a new simulation method using random network theory to generate artificial correlation data predictors with variations in correlations that affect class prediction.

Conclusions

We compared feature selection methods based on detection of functional simulated ROIs, and we applied the new centrality NPDR approach to a resting-state fMRI study of major depressive disorder (MDD) participants and healthy controls. We determined that the areas of the brain that have the strongest network effect on MDD include the middle temporal gyrus, the inferior temporal gyrus, and the dorsal entorhinal cortex. The resulting feature selection and simulation approaches can be applied to other domains that use correlation-based features.

Details

Title

Centrality nearest-neighbor projected-distance regression (C-NPDR) feature selection for correlation-based predictors with application to resting-state fMRI study of major depressive disorder

Author

Kresock, Elizabeth; Dawkins, Bryan; Luttbeg, Henry; Yijie (Jamie) Li; Kuplicki, Rayus; McKinney, B A

First page

e0319346

Section

Research Article

Publication year

2025

Publication date

Mar 2025

Publisher

Public Library of Science

e-ISSN

19326203

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1371/journal.pone.0319346

ProQuest document ID

3174736420

Centrality nearest-neighbor projected-distance regression (C-NPDR) feature selection for correlation-based predictors with application to resting-state fMRI study of major depressive disorder

Jump to:

Full Text

Background

Methods

Simulation method and real data

Simulation approach.

Real data.

Results

Discussion

Conclusions

References

Abstract

Details

Suggested sources