A small number of abnormal brain connections

Full text

Turn on search term navigation

ARTICLE

Received 13 Aug 2015 | Accepted 8 Mar 2016 | Published 14 Apr 2016

DOI: 10.1038/ncomms11254 OPEN

A small number of abnormal brain connections predicts adult autism spectrum disorder

Noriaki Yahata1,2,3,*, Jun Morimoto4,*, Ryuichiro Hashimoto3,5,6,*, Giuseppe Lisi4,*, Kazuhisa Shibata3,7, Yuki Kawakubo8, Hitoshi Kuwabara9, Miho Kuroda8,10, Takashi Yamada3,5, Fukuda Megumi3,11,Hiroshi Imamizu3,12, Jos E. Nnez Sr13, Hidehiko Takahashi14, Yasumasa Okamoto15, Kiyoto Kasai16, Nobumasa Kato5, Yuka Sasaki3,7, Takeo Watanabe3,7 & Mitsuo Kawato3

Although autism spectrum disorder (ASD) is a serious lifelong condition, its underlying neural mechanism remains unclear. Recently, neuroimaging-based classiers for ASD and typically developed (TD) individuals were developed to identify the abnormality of functional connections (FCs). Due to over-tting and interferential effects of varying measurement conditions and demographic distributions, no classiers have been strictly validated for independent cohorts. Here we overcome these difculties by developing a novel machine-learning algorithm that identies a small number of FCs that separates ASD versus TD. The classier achieves high accuracy for a Japanese discovery cohort and demonstrates a remarkable degree of generalization for two independent validation cohorts in the USA and Japan. The developed ASD classier does not distinguish individuals with major depressive disorder and attention-decit hyperactivity disorder from their controls but moderately distinguishes patients with schizophrenia from their controls. The results leave open the viable possibility of exploring neuroimaging-based dimensions quantifying the multiple-disorder spectrum.

1 Department of Youth Mental Health, Graduate School of Medicine, The University of Tokyo, Tokyo 113-0033, Japan. 2 Diagnostic Imaging Program, Molecular Imaging Center, National Institute of Radiological Sciences, Chiba 263-8555, Japan. 3 Department of Decoded Neurofeedback, ATR Brain Information Communication Research Laboratory Group, Kyoto 619-0288, Japan. 4 Department of Brain Robot Interface, ATR Brain Information Communication Research Laboratory Group, Kyoto 619-0288, Japan. 5 Medical Institute of Developmental Disabilities Research, Showa University Karasuyama Hospital, Tokyo 157-8577, Japan. 6 Department of Language Sciences, Tokyo Metropolitan University, Tokyo 192-0397, Japan. 7 Department of Cognitive, Linguistic and Psychological Sciences, Brown University, Rhode Island 02912, USA. 8 Department of Child Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, Tokyo 113-0033, Japan. 9 Disability Services Ofce, The University of Tokyo, Tokyo 113-0033, Japan. 10 Child Mental Health-Care Center, Fukushima University, Fukushima 960-1296, Japan. 11 Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK. 12 Department of Psychology, Graduate School of Humanities and Sociology, The University of Tokyo, Tokyo 113-0033, Japan. 13 School ofSocial and Behavioral Sciences, Arizona State University, Phoenix, Arizona 85306, USA. 14 Department of Psychiatry, Kyoto University Graduate School of Medicine, Kyoto 606-8507, Japan. 15 Department of Psychiatry and Neurosciences, Graduate School of Biomedical Sciences, Hiroshima University, Hiroshima 734-8553, Japan. 16 Department of Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, Tokyo 113-0033, Japan. * These authors contributed equally to this work. Correspondence and requests for materials should be addressed to Y.S. (email: mailto:[email protected]

Web End [email protected] ) or to M.K. (email: mailto:[email protected]

Web End [email protected] ).

NATURE COMMUNICATIONS | 7:11254 | DOI: 10.1038/ncomms11254 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications 1

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11254

Autism spectrum disorder (ASD) is a major developmental disorder characterized by repetitive, restricted behaviour as well as decits in communication and reciprocal social

interactions1. ASD has attracted a great deal of attention of basic and clinical scientists in the hope that clarication of its underlying mechanisms will lead to the development of remedies for ASD as well as a better understanding of the neural substrates of important cognitive functions, including social behaviour2. Despite the signicance of the disorder, no effective biomarker has been developed. The medical diagnosis for ASD has been made largely based on narrative interactions between individuals and clinical professionals. With the exception of clear and typical cases, such diagnostic methods without any biological grounds could run the risk of producing a high variance in diagnosis3 and delaying the detection of abnormalities4.

Magnetic resonance imaging (MRI)-based characterization of ASD has been explored as a complement to the current behaviour-based diagnoses. While previous studies have identied a multitude of ASD-specic structural and functional abnormalities, none of them were actually implemented as a reliable biomarker. The most crucial reason for this disappointing situation5,6 may be the lack of its generalizabilitythe validity of the previously developed classiers has not been established in terms of the diversity of population demographics and the variety of data attributes713. These demographics and data attributes include different ethnicities, ages14, sex15, medication proles16, scanner specications17, imaging parameters18 and instructions to participants19. All of these aspects are known to affect the MRI data. Without proof of generalization, the classier can neither be regarded as practical in clinical applications, nor can neuroimaging features selected by the classier be regarded as the candidate neural substrates of ASD.

The issue surrounding the generalization of neuroimaging-based biomarkers for psychiatric disorders has attracted little attention in neuropsychiatry until recently20,21. The majority of the previous developments of ASD biomarkers were made based on a single-site data, leaving the generalizability issue out of reach713. This situation has held true in more recent investigations that incorporated multiple-site data2225, for which the generalizability issue was not examined for an independent validation cohort. There is one unsuccessful attempt in which the generalizability of a classier was tested on an independent validation cohort26. This study applied a classier, that was previously developed based on measures of structural MRI for the population of the UK7,8, to Japanese ASD individuals26. The classier exhibited more than 80% sensitivity and specicity for the UK training data. However, its performance was no better than chance level for the Japanese test cohort. The results indicate that the development of a reliable neuroimaging-based biomarker is extremely challenging.

To develop a generalizable classier, we must overcome the following two major difculties: over-tting and nuisance variables (NVs). First, particular conditions in data and model properties can cause the over-tting problem20 in which model tting to the training data can be so accurate that the associated errors become articially smaller compared with the inherent data variance. This inated prediction performance typically fails when the model is applied to independent data that are not used for determination of the model. Among other possibilities, determining a large number of model parameters using a relatively small data sample almost inevitably leads to the state of over-tting, which makes the generalization capability of the model extremely poor. For example, the identication of ASD-specic features in magnetic resonance images must necessarily entail a search over a few 104 to 105 voxels (or its squared number

for voxel-to-voxel functional connections (FCs)) using a data set that typically consists of not more than B103 individuals. In this case, the derived classication scheme falls almost inevitably to a state of over-tting, resulting in catastrophic generalization to the external data20. In essence, an excessive number of free parameters in the model introduces undesirable tting to a unique structure of the data, including inherent noise.

Second, any machine-learning algorithm used for classication is doomed to exploit NVs unique to a given sample data, and to erroneously select neuroimaging features that are correlated with the NVs. NVs include both site-specic conditions in image acquisition and properties in the sample population such as demographic attributes, medication status, and onset and duration of illness. However, the neuroimaging features correlated with these NVs are irrelevant to ASD itself in an independent validation cohort, and naturally for the general ASD population. To avoid biased extraction of ASD irrelevant features, it is thus essential that classier development proceed with a large population that is recruited at multiple sites and that all the possible NVs be identied, controlled and removed appropriately in the feature selection (FS) process.

Here, for the rst time we have developed an ASD classier using a multiple-site data set in Japan, and conrmed its generalization capability in two independent validation cohorts in the USA and Japan. We focus on abnormal functional connectivity (FC) in ASD as revealed by resting-state fcMRI (rs-fcMRI)6,2729. To suppress the over-tting and the effects of NVs, a unique combination of machine-learning algorithms automatically and objectively identies a small number of FCs related to the ASD-specic abnormality. The resulting ASD classier, based only on the identied FCs, achieves generalization capability across multiple imaging sites (site generalization).

We also examine a different type of generalizability of the classier towards other disorders (disorder generalization). It is generally believed that ASD shares aetiological and pathophysiological backgrounds with schizophrenia (SCZ) to a greater degree than with attention-decit hyperactivity disorder (ADHD) and major depressive disorder (MDD)3033. Recent genome-level studies reported that ASD shares a signicant degree of polygenic risk with SCZ, but not with ADHD or MDD30,31. An accumulating body of evidence by clinical, behavioural and neuronal phenotypes studies has shown a close relationship between ASD and SCZ32,33. To our knowledge, no study has quantied spectral relationships among multiple psychiatric disorders with respect to biological dimensions dened by intrinsic FCs. Here we examine the spectral relationships among four disorders, ASD, SCZ, ADHD and MDD, by means of a measure provided by the ASD classier. More concretely, we examine the extent to which the ASD classier is specic to ASD or extendable (generalizable) to other psychiatric disorders. This question can be rephrased as follows. Does the ASD classier discriminate only ASD individuals from TD control (that is, specic to ASD), or does it discriminate patients with general psychiatric disorder from their healthy controls (that is, generalized to other disorders)? At the individual level, the output of the ASD classier might provide a quantitative measure of ASD-ness along one of biological dimensions in psychiatric disorders (Supplementary Fig. 1). If the ASD classier shows specicity and/or generalizability only for a certain range of disorders, the ASD-ness may provide a useful biological dimension across the multiple-disorder spectrum34,35.

ResultsHighly accurate Japanese-population-based classier for ASD. We constructed a classier based on the FCs of each individual to

2 NATURE COMMUNICATIONS | 7:11254 | DOI: 10.1038/ncomms11254 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11254 ARTICLE

distinguish ASD from TD adults. The whole brain images were collected from three different sites in Japan (74 high-functioning adult ASDs and 107 adult TDs; see Methods for details). Age and sex were matched between ASD and TD for sites A and B. Site C collected only TD data (Supplementary Table 1). This unbalanced ASD/TD composition among the three sites was taken care of by a machine-learning algorithm as explained below. Each individual image was divided into 140 regions using a sulci-based anatomical atlas (extended Brainvisa Sulci Atlas)36. To obtain interregional FCs through rs-fcMRI, Pearson correlation coefcients were computed for all possible 9,730 pairs of these 140 regions from their mean fMRI time-series data. A machine-learning technique was then applied to the whole set of correlation matrices to optimally select a subset of FCs so that the best classication performance would be obtained. Specically, we applied the L1-norm regularized sparse canonical correlation analysis (L1-SCCA)37 to the data set to identify a subset of FCs relevant only to the neural substrates of ASD while factoring out the effects of noise and NVs associated with the data. In particular, the unbalanced ASD/TD composition was addressed appropriately by incorporating the site label as NV (see Methods subsection Selecting FCs as the ASD classier). We then employed the sparse logistic regression (SLR)38 to further perform dimension reduction to mitigate the over-tting and thereby to extract the essential FCs representing the core abnormal connectivity in ASD (see Methods and Supplementary Fig. 2).

The classication accuracy was evaluated by the leave-one-participant-out cross validation procedure (LOOCV, see Methods). At each iteration, the classier incorporated only15.30.7 out of 9,730 FCs (0.2% of the entire FCs). Supplementary Note 1 and Supplementary Fig. 3 show the robustness and stability of the identied FCs across the cross-validation procedure. The classier separated ASD- from TD-populations with an accuracy of 85% (Permutation test, P 0.001; see Supplementary Fig. 4)39. The corresponding area

under the curve (AUC) was 0.93, indicating high discriminatory ability. For this classier, the weighted linear summation (WLS or linear discriminant function) of the correlation values of the identied FCs predicted the diagnostic label of each individual. An individual with a positive and negative WLS was classied as ASD and TD, respectively. Figure 1a shows that the two WLS distributions of the ASD and TD populations from the Japanese data set were clearly separated by the threshold of WLS 0, to the

right (ASD) and to the left (TD). The sensitivity was 80% and specicity was 89%. This leads to a high diagnostic odds ratio (DOR) of 31.1, which indicates that the effect size is very large. We found that high classication accuracy was not only achieved for the entire data set, but also distributed equally among the three imaging sites in Japan (85% accuracy for all the sites AC; see Supplementary Fig. 5 and Supplementary Table 2).

Generalization of the classier for independent cohorts. We tested the generalizability of the classier using two independent validation cohorts obtained from the US ABIDE Project40 and from site B in Japan (see Methods). In the ABIDE data pool, we selected a subset of individuals that contained 44 high-functioning adults with ASD and demographically matched 44 TD adults, who were recruited from seven sites in this data pool (see Methods). For this US-independent validation cohort located across the Pacic from Japan, the present classier, trained only with Japanese labelled samples, achieved a high performance with an accuracy of 75% (AUC 0.76) and a DOR of 9.0. The

probability of obtaining this high performance by chance is extremely small as P 1.4 10 6 (one-sided binomial test) and

P 0.001 (permutation test, see Supplementary Fig. 4). We

emphasize that this performance was achieved when the classier was applied to the US ABIDE data set for the rst time without any retuning of the machine-learning procedure. Thus, this USA data set was a true and nal validation set. One of the reasons for the true validation test is that our algorithm does not allow any parameter tuning based on the validation cohort (see also Methods). As can be seen in the WLS distribution for the USA data shown in Fig. 1b, the degree of separation between the ASD and TD populations was almost comparable to that of the Japanese data set shown in Fig. 1a. This indicates that the present classier was successfully generalized across the Pacic to an independent validation cohort of more diverse races/ethnicities that were acquired under various imaging settings and conditions. These results indicate that although we developed a highly reliable classier by only using the training data obtained from Japan, it is sufciently universal to classify ASD/TD in the USA validation cohort.

Given the possibility that the identication of ASD-specic FCs in the classier could have been inuenced by ASD individuals factors irrelevant to the core ASD pathology, including secondary symptoms and the medication status, the next challenge was to examine how the ASD classier is resistant to such complications. By relaxing the selection criteria for ASD, we identied 19 additional individuals with ASD and demographically matched 19 additional TDs in the ABIDE data pool (see Methods and Supplementary Note 2). We appended these individuals into the main USA data set to form the extended ABIDE data set that consisted of 63 individuals with ASDs and 63 TDs. Repeating the

Classification based on WLS

TD ASD

Japan

ASD

Number of individuals

0 30 20 10 0 10 20 30

USA

ASD

0 30 20 10 0 10 20 30

Weighted linear summation

Figure 1 | Distribution of weighted linear summations (WLS) of functional connections used for the classication of ASD and TD.(a) The number of TD (white) and ASD (black) individuals in the Japanese data included in a specic WLS interval of width 5 is shown as a histogram (see also Supplementary Fig. 5). (b) WLS for the US ABIDE dataset in the same formats as a.

NATURE COMMUNICATIONS | 7:11254 | DOI: 10.1038/ncomms11254 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications 3

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11254

same analysis for this broadened data set, we found the results to be: AUC 0.74, accuracy 71% and DOR 6.4, which were

slightly worse than the original narrower data set.

We further conrmed the generalizability of the classier using the Japanese-independent validation cohort (AUC 0.77,

accuracy 70%). This cohort incorporated 27 individuals with

ASD and 27 demographically matched TDs. These participants were scanned more recently using a newer and different 3 T MR scanner at site B in Japan, compared to the original Japanese discovery cohort.

Characteristics of the 16 identied FCs in the classier. Figure 2 shows the spatial distribution of the 16 FCs that were automatically and objectively identied from the data for reliable classication of ASD and TD by the machine-learning algorithm. A detailed list of FC properties is provided in Table 1. Because the reliability of classication was generalized to the two independent cohorts, these FCs are thought to be much more trustworthy in characterizing neural substrates of ASD than the FCs that were simply selected in many previous studies by conventional statistical thresholding of ASD/TD differences within a limited data set41.

We identied the following three major characteristics of the 16 FCs in terms of their hemispheric distributions and attributions to known intrinsic functional networks (Fig. 3 and Table 1). First, regarding the hemispheric distribution of the FCs,

inter-hemispheric (69%) and right intra-hemispheric (31%) FCs dominated, whereas the left intra-hemispheric FCs were absent (one-sided binomial test, P 0.01). Second, regarding the

hemispheric distribution of the brain regions involved in the 16 FCs, there were signicantly more regions in the right hemisphere than in the left (one-sided binomial test, P 0.05). Third,

regarding the functional network attributes of the 32 brain regions comprising these 16 FCs (allowing for duplicates in the count), 41% (13 regions) belonged to the cingulo-opercular network14,42. This percentage was signicantly higher than 24%, the anatomically expected percentage, given 33 cingulo-opercular regions among a total of 140 regions ( 33/140 100) (one-sided

binomial test, P 0.02; see also Table 1).

The state of FC exhibiting the smaller (that is, more negative) and greater (more positive) mean correlation index in the ASD population than the TD control is termed under- and over-connectivity, respectively. In the 16 FCs incorporated in the classier, ASD exhibited under-connectivity in nine FCs (rASDorTD) and over-connectivity in seven FCs (rASD4rTD)

compared with TD. See Table 1 for the mean correlation values for the ASD and TD populations and Supplementary Fig. 6 for their distributions. A w2-test indicated that there was no signicant difference between the number of FCs exhibiting under- and over-connectivity (P 0.62). These results suggest

that neither the under- or over-connectivity hypothesis4345 alone can successfully describe the overall aberrancy in the FCs exhibited by the ASD population. One previous study observed

Posterior

Superior

2020

Left

20 88 77 6

Right

All 9,730 FCs considered

2525

2626

2727

2121

2828

2424

1212

Left

2323

1515

Right

2222

Superior

Inferior

1111

1414

1919

Left

Right

Posterior

2929

1313

1010

1717

1818

1616

Posterior

Superior

Inferior

b c

Superior

Inferior

1414

2222

2121

2222

2020

1515

2020

1515

2323

2121

20 55 21 23

1010

1818

1717

1818

1919

1010

2626

2828

1717

1616

33 4

2828

2929

2626

Left

Right

Posterior

1616

2727

2525

25 27

2727

2525

1313

2424

1313

1111

2424

1111

2929

1212

Inferior

Figure 2 | The 16 FCs identied for the ASD/TD classier. (ac) The 16 FCs viewed from (a) top, (b) posterior and (c) left. The inset displays all 9,730 FCs. The 29 terminal regions connected by the 16 FCs were numbered as follows: in the frontal lobe, the superior (1), middle (2), inferior (3, left; 47, right) gyri and rectus (8); in the temporal lobe, the superior (9), middle (10), inferior (11), parahippocampal (12) and fusiform (13) gyri; in the parietal lobe, the superior parietal lobule (14) and the postcentral gyrus (15); in the occipital lobe, the middle occipital gyrus (16), cuneus (17, left; 18, right) and the calcarine ssure (19); in the limbic system, the anterior (20), middle (2122), posterior (23) cingulate gyri and amygdala (24); in the basal ganglia, the caudate (25, left; 26, right), pallidum (27), thalamus (28); and cerebellum (29). See also Table 1 and Supplementary Movie 1.

4 NATURE COMMUNICATIONS | 7:11254 | DOI: 10.1038/ncomms11254 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11254 ARTICLE

Table 1 | Properties of the 16 interregional FCs used in the classication of the ASD and TD populations.

ID Terminal regions Mean FC Wt.Lat. Name Gyral region BA Net. rTD rASD

1 R Diagonal ramus of the lateral f. (4) Inferior frontal g. 44 SM 0.77 0.71 0.88

R Ascending ramus of the lateral f. (6) Inferior frontal g. 45 CO2 R Subcallosal s. (21) Middle cingulum 23 DM 0.39 0.22 1.95

R Calloso-marginal posterior f. (22) Middle cingulum 23 CO3 R Thalamus (28) Thalamus CO 0.30 0.10 2.62

L Subcallosal s. (23) Posterior cingulum 29 DM4 R Amygdala (24) Amygdala 34 CO 0.16 0.05 2.14

L Accumbens (25) Caudate CO5 R Rhinal s. (12) Parahippocampal g. 30 CO 0.11 0.04 2.11

R Olfactory s. (8) Rectus 11 DM6 R Median occipito-temporal lateral s. (11) Inferior temporal g. 20 CO 0.03 0.01 0.98

L Anterior inferior frontal s. (3) Inferior frontal g. 45 FP7 R Posterior terminal ascending branch of the superior temporal s. (10) Middle temporal g. 21 DM 0.09 0.21 1.26

R Internal occipito-temporal lateral s. (13) Fusiform 37 OC8 R Intermediate precentral s. (2) Middle frontal g. 46 FP 0.10 0.19 1.59

L Lobe occipital (16) Middle occipital g. 19 OC9 L Polar frontal s. (1) Superior frontal g. 9 DM 0.16 0.29 1.52

R Retro central transverse ramus of the lateral f. (15) Postcentral g. 3 SM10 R Caudate (26) Caudate CO 0.17 0.22 0.76 L Calloso-marginal anterior f. (20) Anterior cingulum 32 CO11 R Olfactory s. (8) Rectus 11 DM 0.10 0.02 1.78

L Parieto-occipital f. (17) Cuneus 18 DM12 L Pallidum (27) Pallidum CO 0.14 0.04 1.90

R Superior temporal s. (9) Superior temporal g. 22 CO13 M Vermis (29) Vermis CB 0.18 0.04 1.85

R Superior temporal s. (9) Superior temporal g. 22 CO14 L Superior parietal s. (14) Superior parietal g. 7 FP 0.19 0.06 0.99

R Anterior lateral f. (7) Inferior frontal g. 47 CO15 R Inferior precentral s. (5) Inferior frontal g. 44 SM 0.24 0.13 1.00

R Parieto-occipital f. (18) Cuneus 18 OC16 L Anterior inferior frontal s. (3) Inferior frontal g. 45 FP 0.24 0.16 1.74

R Calcarine f. (19) Calcarine 17 OC

CB, cerebellum; CO, cingulo-opercular; DM, default-mode; f., ssure; FC, functional connection; FP, fronto-parietal; g., gyrus; Lat, laterality; L, left; M, medial; Net, network; OC, occipital; R, right;s., sulcus; SM, sensorimotor; Wt., weight.

Listed here are the laterality and anatomical identication of the terminal regions (as dened in the Brainvisa Sulci Atlas), the associated gyral regions (as identied by the Anatomical Automatic Labeling), their Brodmanns areas, and associated networks, for each connection. The number in parenthesis appended to each gyral region represents the region identication in Fig. 2. Identication of the network is after Dosenbach et al.14. In addition, the mean correlations of the TD and ASD populations and the weighting coefcient in the SLR classier are shown. FCs 19 represent under-connectivity (r or ), whereas FCs 1016 represent over-connectivity (r 4r ). See also Supplementary Fig. 6.

a similar trend for the FCs selected simply by using a statistical threshold for the difference in FC between the two populations41. However, there is no guarantee that the FCs that survived should represent the neural substrates of ASD. This is because any statistical result using thresholds varies with a specic data set and the thresholds. Our result is the rst demonstration that neither the under- or over-connectivity hypothesis alone sufciently characterizes the FCs that can classify the ASD population from the TD population in independent validation cohorts (Supplementary Fig. 6 and Table 1). Furthermore, the present ndings are at odds with the previous distance-dependent abnormality hypothesis that proposed disrupted long-range and enhanced short-range connectivity in ASD46. Specically, we found that the mean distance of the FCs was not signicantly correlated with enhancement in the FCs in ASD (that is, correlation between the distance of an FC and its rASD rTD;

difference in correlations of FCs for ASD and TD, r 0.42,

P 0.10) or with the difference in their absolute strengths

(|rASD| |rTD|; r 0.24, P 0.36). Furthermore, there was

no signicant difference (t14 1.3, P 0.23) between the

mean distance of nine FCs exhibiting under-connectivity(64.651.1 mm) and the mean distance of seven FCs exhibiting over-connectivity (92.833.9 mm). Here the distance of each FC was computed as a one-line distance between the central coordinates of the two connected brain regions.

Prediction of diagnostic instrument scores using the 16 FCs. Using the 16 FCs identied in the classier, we predicted the measured domain scores of the two standard diagnostic instruments, the Autism Diagnostic Observation Schedule (ADOS)47 and the Autism Diagnostic Interview-Revised (ADI-R)48. The number of available subjects was 58 for ADOS and 27 for ADI-R. Each instrument contained four domains, as summarized in Supplementary Table 3. In each domain of each instrument, a linear regression was individually employed to determine the weights of 16 FCs so that their weighted linear summation was used as a predictor for the corresponding measured score. Among the total of eight domains of ADOS and ADI-R, we found that the communication domain of the ADOS (ADOS A) was well predicted from the 16 FCs with statistically signicant correlation (r 0.44, uncorrected P 0.001o0.05/8, a

Bonferroni-corrected threshold for multiple comparisons; see Fig. 4a and Supplementary Table 3). The bootstrapping analysis demonstrated that the probability that this correlation r 0.44

would be derived from 16 FCs randomly selected from 9,688 ( 9,730 42) FCs, which were not identied in the LOOCV

procedure, was small (P 0.048, Fig. 4b, see also Methods). These

results demonstrate that the 16 FCs identied in the classier specically contain more useful information than the remaining FCs in predicting the ADOS A score, which is the degree of decits in communicative behaviours.

NATURE COMMUNICATIONS | 7:11254 | DOI: 10.1038/ncomms11254 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications 5

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11254

69%

31%

(r = 0.44)

450

Left intra-hemispheric

connections

Inter-hemispheric

connections

Right intra-hemispheric

connections

Cingulo-opercular network

Ascend ram of lat f

Ant lat fMed occi-temp lat s

Parieto-occipital f

Measured score

300

Frequency

150

Calloso-marg post f

r = 0.44P < 0.001

P = 0.048

Calloso-marg ant f Pallidum Accumbens

Lobe occipital

Subcallocal s Parieto-occipital f Polar frontal s Sup parietal s Ant inf frontal s

Sup temp s

Caudate

Post term ascend branch of

sup temp s

0 0 2 4 6 8

0 1 0.5 0 0.5 1

Rhinal s Amygdala

ThalamusInt occi-temp lat s

Calcarine f

Subcallosal s Olfactory s Intmed precentral s

Diag ram of lat f Inf precentral s

Retro central transverse

ram of lat f

Predicted score Correlation coefficient

Figure 4 | Prediction of ADOS A domain score (communication) using the 16 FCs identied in the classier. (a) Scatter plot of the measured ADOS A domain versus the predicted score, which was computed as a linear weighted summation of the 16 FCs identied by the ASD/TD classier. Each dot represents individual data (n 58, see Methods,

Participants). The line indicates the linear regression of the measured score from the predicted score, and correlation coefcient and statistical signicance are shown (see Supplementary Table 3 for results of the other three domains of ADOS and all four domains of the ADI-R instrument). (b) The frequency of the different correlation coefcient values is plotted in a bootstrap analysis in which 16 FCs were randomly selected from all 9,730 FCs, with the exception of those 42 FCs selected in the LOOCV procedure. The correlation coefcient between the measured and predicted scores was computed as in a. This analysis indicates that the probability of obtaining the correlation coefcient r 0.44 was small (P 0.048), and

demonstrates that the 16 FCs identied in the classier specically contain information useful to predict the ADOS A score.

Others

Vermis

Left Right

Figure 3 | The 16 FCs (solid lines) and their terminal regions (names in boxes). The left and right halves of the gure correspond to the left and right brain hemispheres, respectively. The FCs were classied into three hemispherical categories: left intra-hemispheric, right intra-hemispheric and inter-hemispheric. The terminal regions dened by the BrainvisaSulci Atlas belong to either cingulo-opercular or other networks. Thered background indicates the cingulo-opercular network. ant, anterior; ascend, ascending; calloso-marg, calloso-marginal; diag, diagonal; f, ssure; inf, inferior; int, internal; intmed, intermediate; lat, lateral; med, median; occi-temp, occipito-temporal; post, posterior; ram, ramus; s, sulcus;sup, superior; temp, temporal; term, terminal.

Application of the ASD classier to other disorders. Here we examine the specicity to ASD of the ASD classier, and/or its generalizability to psychiatric disorders other than ASD. If the ASD classier predicts patients with a different disorder as healthy control individuals, then AUC by the ASD classier for the classication of patients with that disorder from their control should be close to 0.5. In this case, we may as well say that the patients possess so little ASD-ness and that the disorder is not related to ASD from the viewpoint of the imaging biological dimension. In contrast, if the ASD classier perfectly discriminates patients with a different disorder from its control, the classication AUC should be close to 1. In this case, we may as well say that the patients possess a large degree of ASD-ness in them and that this disorder is closely related to ASD according to the biological dimension.

To test this, we applied the ASD classier to two additional Japanese cohorts of SCZ and MDD (each containing a healthy control population) and one European cohort of ADHD (containing a TD population) (see Methods). We computed the WLS of the identied FCs in the ASD classier, that is, the ASD-ness of each individual within the SCZ, MDD and ADHD data sets, and their corresponding healthy or TD control populations. We then compared the WLS distributions between each disorder group and its corresponding healthy or TD control (Fig. 5). As expected and already demonstrated in Fig. 1, the separation of WLS distributions was the largest between ASD and TD (Fig. 5a), meaning that the developed ASD classier has a

good ability to discriminate ASD from TD individuals, and the ASD-ness is able to successfully separate the two populations. The separation between SCZ individuals and their healthy controls was poorer than that of ASD but statistically signicant (Fig. 5b; AUC 0.65, KolmogorovSmirnov test, P 0.012 corrected for

multiple comparisons). In contrast to SCZ, the WLS distributions of ADHD and MDD, and their corresponding TD and healthy controls were not distinguishable (Fig. 5c,d; AUC 0.57,

KolmogorovSmirnov test, P 0.65 for ADHD; AUC 0.48,

KolmogorovSmirnov test, P 0.83 for MDD). In other words,

the ASD-ness of individuals with ADHD or MDD is not different from that of their controls. Note that MDD was more completely indistinguishable from its control compared with ADHD according to the ASD-ness. It can be said that the ASD classier was specic to ASD regarding ADHD and MDD, but was modestly generalized to SCZ (compare AUC 0.93 for the ASD

discovery cohort and AUC 0.65 for the SCZ data set). These

results demonstrate that the WLS of the ASD classier, in other words the ASD-ness, quantied the spectrum of the four disorders as follows; SCZ was close to ASD, ADHD was distant from ASD, and MDD was farthest from ASD.

DiscussionIn the present study, we established a reliable neuroimaging-based classier for ASD by investigating the whole-brain patterns of FCs using the rs-fcMRI data of 74 adults with ASD and 107 TD individuals collected at multiple sites in Japan. This classier incorporated as small as 16 FCs (only 0.2% of the entire FCs) distributed across the brain but not contained in the left hemisphere, and allowed a diagnosis prediction accuracy of 85% for each individual with balanced sensitivity and specicity of 80% and 89%, respectively. Most importantly, the high performance of the classier was generalized across the Pacic to the independent, ethnically more diverse, validation cohort in the USA (75% accuracy) with only 10% decrease in accuracy compared with the Japanese discovery cohort. Although

6 NATURE COMMUNICATIONS | 7:11254 | DOI: 10.1038/ncomms11254 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11254 ARTICLE

a ASD

AUC = 0.93P = 6.71020

AUC = 0.65 P = 0.012

AUC = 0.57 P = 0.65

AUC = 0.48 P = 0.83

b SCZ

c ADHD

d MDD

Density

2 1 0 1 2 3 4 5

WLS

Figure 5 | Application of the ASD classier to other psychiatric disorders. The density distributions of the weighted linear sum (WLS) obtained by applying the ASD classier to (a) ASD, (b) SCZ, (c) ADHD and (d) MDD data sets. In each panel, the patient distribution and the TD/healthy control distribution are plotted separately, with coloured and grey areas, respectively. For reference, the WLS distribution of the ASD patients (red area) in a is duplicated across the panels (bd). For each patientcontrol pair in ad, the signicance of the BenjaminiHochberg-corrected KolmogorovSmirnov test and AUC values are shown. In this gure, for the visualization purposes, the WLS of each data set is standardized to match median and s.d. of TD controls across the panels. Note that this WLS standardization is not performed in any quantitative analysis.

successful construction of a FC-based ASD classier has been reported previously49, our work presents the rst achievement of successful classication for ASD across discovery as well as validation cohorts. Our approach may lay a foundation for developing classiers for other psychiatric disorders. This is because, to the best of our knowledge, there exists no neuroimaging-based classier for any psychiatric disorder of which the generalization capability is demonstrated for an independent validation cohort. We have further applied the ASD classier for classication of SCZ, MDD and ADHD patients from their healthy controls. The results indicate that, while the ASD classier exhibited hardly any generalizability as applicable to ADHD and MDD, it can be to a modest degree generalizable to SCZ.

In the present study, both sophisticated machine-learning algorithms and a variety of data from the Japanese three sites were essential for successful generalization of the classier across multiple sites. First, the unique combination of the two machine-learning algorithms, L1-SCCA and SLR, worked to achieve optimal extraction of a small number of FCs that were relevant only to the core ASD characteristics. This optimal extraction avoids over-tting and eliminates the effects of NVs such as age, sex and site-dependent characteristics in the data composition and imaging protocols. In fact, when we applied to our data sets a state-of-the-art machine-learning algorithm of nested cross-validation of the elastic net50, this algorithm selected

10 times more FCs and performed 14% worse on the USA cohort (percent correct of 61%; see Supplementary Note 3 for more detail). This algorithm did not explicitly exclude the interfering effects of NVs. This fact indicates that the generalization capability of our ASD classier across imaging sites most probably stemmed from reduced inuences of FCs related to NVs owing to the unique combination of L1-SCCA and

SLR. By utilizing synthetic data sets and by comparing with elastic net, we demonstrated that the unique combination of L1-SCCA and SLR can be useful to reduce the inuence of FCs related to NVs (see Supplementary Note 4). Second, we examined the conditions for which data sets can derive reliable ASD classiers. More concretely, we selected one or two of the three sites within the Japanese discovery cohort for training data sets (Supplementary Table 4) and constructed classiers. In addition, we trained a classier with the ABIDE data set and examined its generalization capability for the Japanese cohort (Supplementary Note 5). The resulting performances were generally poorer than that of the current ASD classier. Especially, the performance of the ASD classier, which was developed using the ABIDE data set, was very poor. It is suggested that having a sufcient number of participants in total as well as in each site is a necessary condition to construct a reliable ASD classier (see also Methods). We conclude that both a sophisticated machine learning algorithm and a large training data set are essential for developing a reliable and generalizable classier.

It is worth noting that the performance of the classier may be upper-limited by the aetiological and phenotypic heterogeneity of ASD, which is likely to be accompanied by differential biological underpinnings. This limitation may be ameliorated and the overall diagnostic precision may be improved by identifying subgroups that are biologically more uniform within a given population and by extracting a set of features that characterize each subgroup.

What do the results of the applications of the ASD classier to the other psychiatric disorders suggest? Figure 5 shows the density distributions of the WLS that resulted from the application of the ASD classier to data sets for other disorders, SCZ, MDD and ADHD. The results raise the intriguing possibility that the degree of generalizability and specicity of the ASD classier to these other disorders reect their spectral structure on the scale of whole-brain intrinsic functional networks. From this perspective, Fig. 5 suggests that ASD shares more intrinsic-functional networks with SCZ than with ADHD or MDD. This is consistent with the results of the previous clinical works as described in the opening paragraphs30,32. These results raise the possibility that a neuropsychiatric disorder can be redened and represented as a location in a multi-dimensional space dened by FC-based biological dimensions, each of which takes the form of WLS consisting of a small number of FCs34,35. In this case, the ASD-ness could make a dimension along which ASD and SCZ are located and might be orthogonal to another dimension along which ADHD and MDD are located. Interestingly, to aim at building a biomarker for ASD, we started with a categorical approach by which a supervised machine-learning algorithm was utilized and a diagnostic label was adopted as its teaching signal. However, the results of applying the classier built for that original purpose to other neuropsychiatric diseases have raised the exciting possibility that the classier allowed us to go beyond the category regime and to embark on the exploration for new biological dimensions34,35.

How would the current research results contribute to the future diagnoses and therapy for neuropsychiatric disorders? Recently, an increasing number of researchers have had the perspective that a biomarker can be used to stratify a broad illness phenotype into a nite number of treatment-relevant subgroups, thereby

NATURE COMMUNICATIONS | 7:11254 | DOI: 10.1038/ncomms11254 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications 7

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11254

bypassing nosological arguments over diagnostic boundaries5. This perspective could lead to developments of multiple neuroimaging-based biomarkers for multiple psychiatric disorders. The generalization capability across imaging sites is a bare minimal requirement for a classier towards its clinical applications. If reliable classiers for ASD, SCZ, MDD, ADHD and even other disorders are developed in the near future, these classiers may be utilized as a new diagnostic tool as a set of biomarkers that can quantify intrinsic FCs of individuals who need in-depth clinical examinations for various disorders. Such quantication is possible because multiple biomarkers can use the same rs-fcMRI data from an individual. In this vein, the current study could provide a foundation for such a future direction in neuropsychiatric diagnoses.

We formed a consortium with other researchers in 2013 as part of the Japanese Strategic Research Program for Promotion of Brain Science (SRPBS)51 with the aim of applying big data, machine-learning algorithms and sophisticated fMRI neurofeedback methods52,53 to study diagnosis and therapy for multiple psychiatric disorders. We have developed fMRI real-time neurofeedback methods that can change FC between brain regions with healthy and ASD individuals. First, in healthy individuals, 4 days of fMRI real-time neurofeedback of FC between two designated areas changed rs-fcMRI connectivity, and the changes remained for more than 2 months53. Second, for ASD individuals within the SRPBS consortium, a further developed neurofeedback method of FC has been applied54,55. The identied 16 FCs in the present research may contribute to improvements in the fMRI real-time neurofeedback of FC to ASD individuals as a possible therapeutic target. In one recent study with ASD individuals, target FCs were selected by the ASD classier developed in the present study54, and the WLS of the ASD classier was estimated in a real-time fMRI neurofeedback paradigm for an ASD individual. Then, a sign-inverted WLS value was presented to the ASD individual as a neurofeedback target to increase in a reinforcement learning paradigm54. Increases in the neurofeedback score lead to reductions in the WLS value. Researches within the SRPBS consortium have suggested that the ASD classier we have developed in this study could be a useful tool, with which connectivity neurofeedback methods would make further progress to attain the goal of ASD individuals obtaining normal rs-fcMRI dynamics51.

In summary, we have developed a generalizable rs-fcMRI-based classier for ASD28 for the rst time. Despite the fact that this classier is based on a small number of identied FCs, it greatly distinguishes ASD from TD with demonstrated generalization in the independent validation cohorts, and accounts for socio-communicative aspects of ASD. The results of applications of this ASD classier to other psychiatric disorders have left open the interesting possibility of exploring new neuroimaging-based dimensions for multiple-disorder spectrum.

Methods

Participants. All participants in the present study provided written informed consent as approved by the ethics committees of the recruiting institutions as follows: the Ethics Committee of the Graduate School of Medicine and Faculty of Medicine at the University of Tokyo, the Ethics Committee of the Faculty of Medicine of Showa University, the Institutional Review Board of Advanced Telecommunications Research Institute International, the Committee on Medical Ethics of Kyoto University and the Ethics Committee of Hiroshima University.

A total of 74 adults with ASD and 107 age, sex, handedness and IQ-matched TD individuals participated in the present study. The participants were recruited at three different sites in Japan (sites AC). Their demographic information is summarized in Supplementary Table 1.

At site A, participants with ASD were recruited through the Department of Child Psychiatry and Neuropsychiatry at the University of Tokyo Hospital and via an advertisement on the website of the University of Tokyo Hospital. All ASD participants (n 35) were diagnosed with pervasive developmental disorder (PDD)

based on the DSM-IV-TR criteria56. DSM-IV-TR diagnoses of autistic disorder, Aspergers disorder, or pervasive developmental disorder not otherwise specied (PDD-NOS) (n 24, n 3 and n 8, respectively) were supported by ADOS47

(n 35) and ADI-R48 (n 27). The Japanese version of mini-international

neuropsychiatric interview M.I.N.I. was used to evaluate psychiatric comorbidity57. No participant satised the diagnostic criteria for substance use disorder, bipolar disorder or SCZ. The IQ scores of participants with ASD were obtained using the Wechsler adult intelligence scale-revised (WAIS-R) or third edition (WAIS-III). The full-scale IQs (FIQs) of all of the individuals with ASD were measured and found to be 485. TD individuals were recruited from the local community and via other on-going studies at site A. M.I.N.I. was used to conrm that none of the TD met the diagnostic criteria for any psychiatric disorder. The IQs of the TD individuals were estimated using the Japanese version of the national adult reading test (JART)58. All participants were right-handed according to the Edinburgh Handedness Inventory59. Participants completed the Japanese version of the autism-spectrum quotient (AQ-J)60. At the time of scanning, 11 ASD individuals were medication free, whereas the remaining 24 ASD individuals had been administered the following psychotropic drugs: anxiolytics (n 18),

antidepressants (n 20), antipsychotics (n 15), antiepileptics (n 6) and

sleep-inducing drugs (n 17), before the scanning. Some participants had been

administered multiple drugs (n 20).

At site B (Showa University Karasuyama Hospital), the diagnoses were made by a team of experienced psychiatrists and clinical psychologists based on detailed interviews of individuals regarding their development and behaviour, from infancy through adolescence and family history. At least one caregiver who knew the individual in his/her childhood was usually present in the interview. At the end of the clinical interview, the psychiatrist diagnosed the individuals according to the DSM-IV-TR diagnostic criteria for PDD based on the consensus of the psychiatrists and the clinical psychologists. These assessments resulted in all of the participants in the ASD group (n 39) receiving clinical diagnoses of autistic

disorder (n 19), Aspergers disorder (n 16) and PDD-NOS (n 4). The

diagnoses for 23 individuals were supported by ADOS47. As performed at site A, the IQ scores of participants with ASD were obtained using WAIS-R or WAIS-III. FIQ for all individuals with ASD was 480. FIQ for all TD individuals was estimated using JART. All participants, including ASD and TD, completed the Japanese version of the AQ test60. M.I.N.I. was used to conrm that none of the TD individuals met the diagnostic criteria for any psychiatric disorder. At the time of scanning, 27 ASD individuals were medication free, whereas the remaining 12 ASD individuals were administered the following psychotropic drugs: anxiolytics (n 7), antidepressant (n 9), antipsychotics (n 3), antiepileptics (n 3) and

sleep-inducing drugs (n 7). Some participants were administered multiple

drugs (n 8).

At site C (Advanced Telecommunications Research Institute International),33 TDs participated in the present study after providing written informed consent. None of the participants had a previous history of neurological disorders. All participants were right-handed as conrmed by the Edinburgh inventory. No ASD individuals were recruited at site C as it was not a medical institution.

Training data set used for construction of the ASD classier. MRI data for the training of the ASD/TD classier were acquired at three different sites in Japan. An MRI system was used at the site where participants were recruited. Each imaging site adopted its own imaging protocol (Supplementary Table 5), differing in both imaging parameters and instructions provided to the participants during the scan. These discrepancies were taken into account in subsequent classication analysis (see the section below L1-regularized sparse canonical correlation analysis

and Supplementary Note 6). At all sites, participants were subjected to high-resolution T1-weighted structural imaging as well as resting-state functional imaging, which were preprocessed with SPM8 (Wellcome Trust Centre for Neuroimaging, University College London, UK) software running on MATLAB (R2012b, Mathworks, USA) as follows. First, the raw functional images were corrected for slice-timing and realigned to the mean image of that sequence to compensate for head motion. Next, the structural image was co-registered to the mean functional image and segmented into three tissue classes in the Montreal Neurological Institute (MNI) space. Using associated parameters, the functional images were then normalized and resampled in a 2 2 2 mm3 grid. Finally,

they were smoothed by Gaussian of full-width at half-maximum 6 mm. Because subsequent analysis is based on the pattern of temporal correlations among the brain regions and the evaluation has shown itself to be highly sensitive to abrupt head motion during scanning, the pre-processed sequence of functional images was examined as follows. First, we evaluated the mean relative displacement61 in each of the six motion parameters (that is, translation along and rotation with respect to the x, y and z axes) by calculating the mean of the absolute frame-to-frame relative changes in each individual parameter through a given time series (namely, the mean of Dpi

pi1 pi

j jacross the time series, where p is one of the six

motion parameters and i species the time point). In both the Japanese and the USA data sets, there was no statistically signicant group difference in this measure for any of the six motion parameters (Supplementary Table 6). Next, for each participant, we calculated the frame displacement (FD) at each time point by summing Dpi

for all six parameters. Using this FD, we employed the scrubbing

procedure62 to identify and exclude any frame exhibiting excessive head motions.

8 NATURE COMMUNICATIONS | 7:11254 | DOI: 10.1038/ncomms11254 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11254 ARTICLE

Specically, a frame was agged and removed, along with the previous and two subsequent frames, from the correlation analysis, if the associated FD exceeded0.5 mm (ref. 62). On average (s.d.), 93.411.2% and 95.77.5% of the original frames passed this scrubbing procedure in the TD and ASD populations, respectively, and these fractions did not differ signicantly between the two populations (two-sample t-test, P 0.13).

Generalization to USA data. The performance of the classier was further tested using an independent data set available from the US ABIDE Project40. For individuals with ASD, the criteria for data inclusion were that they must (1) be older than 18 years of age, (2) be right-handed, (3) have a FIQ exceeding 80,(4) have no comorbidity and (5) have been diagnosed as autistic by either ADI-R or ADOS. Individuals with an explicit medication record at the time of the scan were excluded. A total of 54 autistic individuals satised these criteria. In addition,52 age-, sex- and FIQ-matched TD individuals satisfying criteria (1)(4) were selected from the same pool of participants. Then, their MRI data were obtained and the functional images were visually inspected to ascertain that the eld of view covered the entire brain. We found that functional images for 18 participants suffered from severe truncation at either the top (parietal lobe) or bottom (inferior temporal lobe to cerebellum) portion of the brain, and they were removed from the further analysis accordingly. A total of 44 individuals with ASD and 44 demographically matched TD individuals comprised the nal list. Their properties are summarized in Supplementary Table 7. The imaging protocols adopted at each site are summarized in Supplementary Table 8. We preprocessed the MRI data in the same way as for the Japanese data and calculated the interregional FC for each subject. In addition, to test the performance of the classier on individuals with more diverse proles, we formed a Supplementary Dataset by removing the conditions for FIQ, comorbidity and medication status. The detail of the dataset and the result of the analysis are provided in Supplementary Note 2.

Further evaluation of the generalization capability. To further examine the classiers generalization capability, we formed an extra data set at site B (Showa University Karasuyama Hospital) independently of the training data set. This data set incorporated 27 individuals with ASD and 27 demographically matched TDs who were recruited under the criteria of this site (see the Participants section). The data acquisition was conducted using a new 3 T MR scanner of a different manufacturer, replaced after the formation of the training data set. For more detail including the imaging protocol, see Supplementary Table 9.

Application of the ASD classier to other disorder. To further understand the generalization property of the ASD classier, we evaluated its classication specicity to ASD. Specically, we tested its classication performance using the data sets that incorporated individuals with SCZ, MDD and ADHD as follows.

We formed a data set that consisted of 66 patients with SCZ (34 females, age38.29.1 year) and 107 age-matched healthy control participants (40 females, age34.68.2 year). Their handedness was determined by the Edinburgh Handedness Inventory59. The means.d. of the SCZ was 75.040.9 and that of the healthy control was 81.235.1. The patients were recruited at in- and out-patient facilities in the Kansai region in Japan. They were diagnosed with the patient edition of the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID). None of the patients had comorbid psychiatric disorders. The controls were recruited from the local community at Kyoto University. None of them had any history of psychiatric illness, as indicated by screening results using the nonpatient edition of the SCID. These screening results were conrmed by the fact that none of their rst-degree relatives had any history of psychotic disorders. The MR data of both patients and healthy controls were acquired at the Kyoto University Hospital (see Supplementary Table 9 for more details). All participants were physically healthy when they were scanned. The details of this data set are stated elsewhere.

Next, we formed a data set that consisted of 105 patients with unipolar MDD (51 females; 42.811.5 year) and 145 age-matched healthy control participants (90 females; 39.512.7 year). Their handedness was determined by the Edinburgh Handedness Inventory59. The means.d. of the MDD were 84.830.1 and that of the healthy control populations was 81.736.0. The patients were recruited from a local clinic and the healthy controls in the community of the Hiroshima University. The MR data of both patients and healthy controls were acquired at the Hiroshima University Hospital and other local imaging facilities (see Supplementary Table 9 for the summary of the data). All the patients were screened with the DSM-IV criteria for a unipolar MDD diagnosis using M.I.N.I. No patient had current or past SCZ episodes. Healthy participants were interviewed with M.I.N.I. and none of them showed a history of psychiatric disorders according to DSM-IV criteria. The details of this data set will be described elsewhere.

Next, we formed a data set that incorporated individuals with ADHD and TDs acquired by the NeuroIMAGE project in the Netherlands (http://www.neuroimage.nl/

Web End =http://www. http://www.neuroimage.nl/

Web End =neuroimage.nl/ ). They were a part of the ADHD-200 Sample and we obtained their permission-free MR data and the associated demographic and phenotypic information from the ADHD-200 Sample website (http://fcon_1000.projects.nitrc.org/indi/adhd200/

Web End =http://fcon_1000.projects.nitrc. http://fcon_1000.projects.nitrc.org/indi/adhd200/

Web End =org/indi/adhd200/ ) under an unrestricted usage agreement for non-commercial research purposes. Because the current study focuses on adult ASD, only individuals with rounded ages 418 years were incorporated into the present

analysis. The nal data set consisted of 13 individuals with ADHD (2 females;19.01.1 year) and age-matched 13 TDs (7 females; 19.21.2 year). The subtype identication of the ADHD populations was three hyperactive-impulsive types, one inattentive type and nine combined (hyperactive-impulsive and inattentive) types. Among the ADHD individuals, 10 were right-handed, 3 were left-handed and no information was available for 1 individual. Among the TD individuals, 11 were right-handed and 2 were left-handed. For more information on their demographic and phenotypic properties, see the ADHD-200 Sample website.

To evaluate the extent to which each additional disorder tends to share traits with ASD, we applied the ASD classier to these three data sets in the same manner as the US ABIDE data. The AUC of the classication was computed to evaluate the degree of separation of SCZ, MDD and ADHD from their corresponding control population. Moreover, the KolmogorovSmirnov test was used to compare the WLS distributions of each disorder with their corresponding healthy controls and TD individuals. The WLS distributions of the patient and control populations were tted separately with a mixture of Gaussians distributions and illustrated in Fig. 5, along with the respective AUC and KolmogorovSmirnov P value. The P value of each test was corrected for multiple comparisons by the BenjaminiHochberg procedure. For visualization purposes, the WLS in each data set was standardized such that the median and s.d. of healthy controls and TD individuals were matched across panels. It should be noted that this standardization was not used for any statistical analyses.

Interregional correlation analysis. For each participant, a pair-wise, interregional FC was evaluated among 140 regions of interest (ROIs) covering the entire brain. The spatial extent of each region was dened anatomically by the digital atlas of the Brainvisa Sulci Atlas (BSA)36. Because this atlas did not include the cerebellum, the subregions of the cerebellum were appended to it by incorporating their boundary denitions in the anatomical automatic labelling (AAL) package63. Although the original AAL atlas dened 26 subregions in the cerebellum, they were reorganized into the following three subregions in this study: the left and right cerebellum, and the vermis. This modication was necessary because the scanned volume did not cover the entire cerebellar regions for some individuals, and this incomplete coverage lead to missing elements in the correlation matrix. To surmount this, the mean time courses in the cerebellar regions were evaluated in three broader subregions by masking, if any, unavailable voxels in each individual. This BSA-AAL composite atlas was resampled in the 2 2 2 mm3 grid MNI space.

The representative time course in each region was extracted by averaging the time courses of the voxels therein. A band-pass lter (transmission range, 0.0080.1 Hz) was applied to these sets of time courses prior to the following regression procedure. The ltered time courses were linearly regressed by the temporal uctuations of the white matter, the cerebrospinal uid, and the entire brain as well as six head motion parameters. Here, the uctuation in each tissue class was determined from the average time course of the voxels within a mask created by the segmentation procedure of the T1 image. The mask for the white matter was eroded by one voxel to consider a partial volume effect. These extracted times course were bandpass ltered (transmission range, 0.0080.1 Hz) before the linear regression, as was done for regional time courses. Then, for each individual, a matrix of 9,730 FCs between 140 ROIs was calculated by exhaustively evaluating pair-wise temporal Pearson correlations of blood oxygenation level dependent signals time courses while discarding agged frames, if any, in the previous procedure (scrubbing). We note that unltered motion-related regressors, such as six head motion parameters in the present case, could reintroduce high-frequency uctuations into the time course data61. The scrubbing procedure was employed to remove any frames exhibiting abrupt head motions that could be the source of high-frequency uctuation in the ltered time course64. In addition, we conrmed that there was no statistically signicant difference in any of the motion parameters between the ASD and TD populations (Supplementary Table 6). Thus, the chance that the classication was inuenced by head motion remained minimal.

Selecting FCs as the ASD classier. Two major challenges exist in constructing a classier for ASD. The rst challenge is the problem of over-tting, because of the small sample size. As previously mentioned, the dimension of the input to the classier is M 9,730. However, the amount of data is only N 181. Because N is

much smaller than the dimension of data M, the parameters of the classier can be easily over-tted to the training data. Because of this over-tting, the constructed classier will likely exhibit extremely poor performance with newly sampled test data, which are not used in training the classier. Therefore, we need to properly introduce regularization to identify and utilize only essential FCs to ensure good generalization of the classier. Here we adopted a cascade of the L1-regularization method, a well-known approach for managing the problem of small sample size, and a sparse estimation method with automatic relevance determination65,66,as detailed below and in the next two sections.

The second major challenge is related to NVs and is due to the fact that an ASD classier is clinically useful and scientically trustworthy only if it maintains good performance for MRI data scanned at imaging sites different from the sites where the training data were collected. This is the so-called generalization capability across imaging sites. However, in clinical applications, it has often been observed that a classier trained using data acquired from a particular site cannot be generalized to the data scanned at other sites7,8,26. We overcame the second

NATURE COMMUNICATIONS | 7:11254 | DOI: 10.1038/ncomms11254 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications 9

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11254

challenge by using a wide variety of training data sets obtained at the three imaging sites, and by the unique combination of two sophisticated machine-learning algorithms: L1-SCCA and SLR. Through additional analyses, we conrmed that

MRI data scanned from at least three sites, and obtained under a variety of imaging conditions, are necessary to train a classier that generalizes across multiple sites (see Supplementary Table 4 for the results obtained when data from only one or two sites were used). Furthermore, to extract FCs essential for ASD classication and to reduce undesirable effects due to different scanning conditions and demographic distributions at different sites, that is, irrelevant NVs, we adopted the L1-regularized sparse canonical correlation analysis (L1-SCCA)37 (see also

Supplementary Note 6).

Altogether, the procedure for selecting relevant FCs, training a predictive model and assessing its generalization ability was carried out as a sequential process of 9 9 nested feature-selection and leave-one-out cross-validation (see also the

schematic diagram in Supplementary Fig. 2). In each leave-one-out (LOO) cross-validation (CV) fold, all-but-one subjects were used to train a SLR38 classier, while the remaining subject was used for evaluation. SLR has the ability to train a logistic regression model, while objectively pruning FCs that are not useful for the purpose of classifying ASD. Before training SLR, it is necessary to reduce the input dimension to some extent and simultaneously remove the effects of NVs that may cause catastrophic over-tting. Therefore, prior to LOOCV, nested FS was performed using L1-SCCA. L1-SCCA identies the latent relationships between FCs and various attributes of each individual, including the diagnostic label, available demographic information and imaging conditions (see details in the next section). By selecting FCs that have a connection with a canonical variable related onlyto the Diagnosis label and not to NVs, we aimed to reduce the interferential effects of NVs.

The feature (FCs) selection procedure was similar to 9 9 nested cross-

validation, with the difference being that the test set was never used for validation or feature (FCs) selection (Supplementary Fig. 2). In this way, L1-SCCA was trained on different subsamples of the data set, to increase the stability of the selected features. The test set of the outer loop FS process was kept as a testing pool for LOOCV, whereas the nine folds of the inner loop FS were used to select features. Consequently, the LOOCV folds that belonged to the same testing pool of the outer loop FS shared the same reduced features. In the inner loop FS, the L1-SCCA hyperparameters l1 and l2 were varied independently between 0.1 and 0.9 (l1rl2) with a step of 0.1. For each instance of L1-SCCA, we found the canonical variables connected only with the label Diagnosis and kept the features associated with those canonical variables. On average, the number of l combinations that complied with this constraint was 17.65.0% of the total 45 possible combinations. The features selected at each inner fold and l combination were combined by the union operation, to include features that are important for any possible subsample (inner nine folds) of the training data set. This procedure leads to the selection of 4,529161 FCs (mean and s.d. across outer folds). Once the inner loop FS was executed, one sample was taken from the testing pool of the outer loop FS, and used as the test set of the LOOCV. The remaining samples were used to train SLR on the FCs retained during the inner loop FS. The advantage of this nested FS procedure was that it used most of the data to train the models (LOOCV), while performing time-efcient FS (9 9 nested FS). The FS procedure based on

L1-SCCA is time consuming, and doing it for each fold of the LOOCV is not computationally feasible. Instead, we kept one of the nine outer folds as a testing pool for the LOOCV and ran L1-SCCA on the remaining eight folds. This means that we could reuse the selected features for all the samples in the testing pool, during the LOOCV. That is, by performing ninefold nested FS we kept the test set of the LOOCV separate from the data set used to select features. We could efciently (from a computational time perspective) avoid information leakage and over-optimistic results20. It should be noted that, to split the data into nine folds, we used a stratied approach, so as to keep an equal amount of (diagnosis, gender and site) combinations per fold.

One of the advantages of the proposed algorithm is that it does not relyon parameter tuning. Indeed, the L1-SCCA procedure is inspired by the stability-selection67 approach, where subsampling is combined with selection algorithms. Specically, it was designed in such a way that the amount of regularization is not chosen explicitly based on the L1-penalty tuning, but rather on concatenation of the features selected by different L1-penalties and subsamples. In the original stability-selection paradigm, features are selected based on the frequency of selection across subsampling repetitions, which typically requires threshold tuning by additional cross-validation. In this study, the union of the FCs selected by L1-SCCA on different subsamples aims to avoid the tweaking of such additional parameters. Moreover, SLR relies on automatic relevance determination, a Bayesian procedure that does not require parameter tuning.

To classify the independent cohort data sets (for example, the US ABIDE data set), we trained the nal SLR classier based on the union of the features selected throughout the 9 9 nested FS, using all the Japanese subjects as the training set.

L1-regularized sparse canonical correlation analysis. In general, by employing canonical correlation analysis (CCA)68, we can identify latent relationship between paired observations. Specically, CCA can derive projection vectors so that the paired projected variables (called canonical variables) have maximum correlation. Suppose that we have N observations of the paired variables x1 2 Rp and x2 2 Rp .

Let X1 x11; x21; . . . ; xN1

T denote the N p1 matrix comprising the rst set of

variables, and let X2 x12; x22; . . . ; xN2

T denote the N p2 matrix comprising the

second set of variables. As we explained in the previous section, we use sparse CCA with L1-norm regularization, L1-SCCA37. We assume that the columns listing X1 and X2 of the training set have been centred to have zero mean and scaled to have unit variance. For one canonical variable, L1-SCCA can then be formulated as

max

v ;v vT1XT1X2v2 subject to v1

k k21 l1; v2

k k21 l2; v1

k k22 1; v2

k k22 1; 1

where hyperparameters l1 and l2 indicate the sparseness of the projection vectors v1 and v2, respectively. Since the maximum number of canonical variables isq min(p1, p2), the projection matrices are dened as V1 2 Rp q and V2 2 Rp q,

where each column contains the projection vector that is associated with a canonical variable.

To identify the latent relationships between demographic information and FC, we constructed two data matrices. A row of the rst data matrix X1 lists the properties and attributes of a subject, including the diagnosis (ASD or TD), site information indicating where the brain activities of the subject were scanned, age, sex, imaging conditions (open or closed eyes) and status of medication (antipsychotics, antidepressant and anxiolytics, separately) (see Supplementary Fig. 7). More specically, the number of columns of the demographic information data matrix X1 is 10, that is, p1 10. The rst column contains either 1 ( ASD) or

0 ( TD). The next three columns contain either [1 0 0] ( site A), [0 1 0] ( site B)

or [0 0 1] ( site C). The fth column contains age value, the sixth column

contains either 1 ( male) or 0 ( female), the seventh column contains either 1

( eye open) or 0 ( eye closed) and the last three columns contain status of the

three medications, where each column contains either 1 ( with medication) or 0

( without medication). The second data matrix X2 pools a row-vector form of the

off-diagonal lower triangular portion of a correlation matrix that represents the FC of a subject. L1-SCCA was applied to the pair of matrices X1 and X2, from which the sparse projection matrices V1 and V2 were derived. In this study, we dened the constraint that at least one canonical variable should be associated only with the diagnostic label, and we called it diagnostic canonical constraint. In addition, we call canonical variables that are associated only with the diagnostic label as diagnostic canonical variables. This was achieved by looking at the columns of V1 that had a non-zero element only in the rst row (that corresponds to the diagnostic label). Subsequently, only the columns of V2 corresponding to the diagnostic canonical variables were used to form the vector vD2, by computing the sum of the absolute value across columns (that is, union of features across diagnostic canonical variables). Moreover, we obtained the union of features (that is, vD;union2) across repetitions of the inner loop FS and across l1; l2

h i

meeting diagnostic canonical constraint, by computing the sum of all the vD2 derived in the process. Then, we used vD;union2 to identify the indices of the connectivity vector relevant to the diagnosis label. We projected the original connectivity vector into a subspace dened by the nonzero elements of vD;union2.

Here we dened a variable ik to denote the index number of the k-th nonzero element of vD;union2, where 1rkrm and m denotes the number of nonzero elements. We then considered the projection matrix E ei ; ei ; . . . ; ei

T to the

subspace, where ei 2 Rp is the standard basis vector containing 1 in the ik-th

element and 0 in the other elements. Finally, we derived the vector in the subspace zARm by projecting the original connectivity vector x2 as

z Ex2: 2 By choosing the FCs that corresponded to the canonical variables that are connected only with the diagnostic label, we could select essential FCs for classication. Simultaneously, undesirable effects caused by demographic and imaging differences at different imaging sites, that is NVs, were reduced through L1-SCCA, as explained in Supplementary Notes 6,4 and Supplementary Fig. 8.

This procedure makes the MRI data from the three imaging sites useful in constructing a robust classier that generalizes across foreign, that is, USA, imaging sites.

Prediction of the diagnostic label. To predict the diagnostic label from the extracted feature input z of equation (2) (identied FCs), we employed logistic regression as the classier. In logistic regression, a logistic function is used to dene the probability of a participant belonging to the ASD class as

Py 1 j ^z; w

1 exp w

; 3

where y represents the diagnosis class label, that is, y 1 indicates ASD and y 0

indicates the TD class, respectively. ^z zT; 1

T2 Rm1 is a feature vector with an

augmented input, where the feature vector z is extracted from the connectivity matrix of one participants resting-state MRI sample (for more detail about data standardization, see Supplementary Note 7). Using the augmented input 1 is a standard approach to introduce constant (bias) input for the classier. wARm 1 is the weight vector of the logistic function. To further decrease the dimension of the feature vector, which was already reduced by L1-SCCA according to equation (2), we used an SLR method, as described in the next paragraph. SLR automatically selects the features related to the ASD label as input for the logistic function. In SLR, the probability distribution of the parameter vector is estimated using the hierarchical Bayesian estimation approach, in which the prior distribution of each element of the parameter vector is represented as a Gaussian distribution. Because

10 NATURE COMMUNICATIONS | 7:11254 | DOI: 10.1038/ncomms11254 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11254 ARTICLE

of the automatic relevance determination property of the hierarchical Bayesian estimation method, some of the Gaussian distributions become sharply peaked at zero so that the irrelevant features are not used in the classication.

Linear regression to the clinical indices. Using the identied 16 FCs in the classier, we predicted the four domain scores of the two standard diagnostic instruments measured: the ADOS47 and the ADI-R48 (see also Supplementary Table 3). Specically, in each domain of an instrument, the score of each individual was predicted by calculating the linear weighted sum of his/her 16 correlation coefcients that corresponded to the 16 FCs in the classier. The set of weights for this summation was determined through LOOCV, in which the domain scores of all-but-one participants were linearly regressed using the respective 16 correlation coefcients as explanatory variables. Because previous studies indicated age- and sex-related differences in cognition across the lifespan of adult ASDs69,70, behavioural measures such as ADOS and ADI-R can be assumed to exhibit dependence on these factors. We therefore incorporated age and sex into the regression model as additional explanatory variables. The LOOCV was necessary to avoid any information leakage from the individual to be predicted. The agreement between the measured and predicted domain scores was evaluated by the Pearson correlation coefcient and the statistical signicance was tested against the null hypothesis that there is no relationship between measured and predicted scores. For the domain score with a signicance correlation, the reliability of the prediction was further tested by a bootstrapping analysis (10,000 repetitions) where the alternative prediction was performed using 16 randomly selected FCs from the pool of 9,688 ( 9,730 42) FCs, which were not selected by SLR in the LOOCV

procedure for the ASD/TD classication (Supplementary Note 1). At each permutation, a new regression model was computed for each of the eight domain scores and the correlation between the predicted and measured domain score was calculated. The highest of the eight correlations (corresponding to the eight domains) was then selected and pooled over the permutations. The reliability of the prediction using the original 16 FCs was evaluated by integrating the normalized cumulative distribution of the pooled correlation coefcients derived through this bootstrapping procedure.

Code availability. The classication code and the correlation matrix data used in the present study are available at a secure server of ATR Brain Information Communication Research Laboratory. Please contact the server administrator ([email protected]) for access.

References

1. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders 5th edn (American Psychiatric Association, 2013).

2. Frith, C. D. & Frith, U. Cognitive psychologyinteracting mindsa biological basis. Science 286, 16921695 (1999).

3. Mandell, D. S., Ittenbach, R. F., Levy, S. E. & Pinto-Martin, J. A. Disparities in diagnoses received prior to a diagnosis of autism spectrum disorder. J. Autism Dev. Disord. 37, 17951802 (2007).

4. Nylander, L., Holmqvist, M., Gustafson, L. & Gillberg, C. Attention-decit/ hyperactivity disorder (ADHD) and autism spectrum disorder (ASD) in adult psychiatry. A 20-year register study. Nord. J. Psychiatry 67, 344350 (2013).

5. Kapur, S., Phillips, A. G. & Insel, T. R. Why has it taken so long for biological psychiatry to develop clinical tests and what to do about it? Mol. Psychiatry 17, 11741179 (2012).

6. Castellanos, F. X., Di Martino, A., Craddock, R. C., Mehta, A. D. & Milham, M.P. Clinical applications of the functional connectome. NeuroImage 80, 527540 (2013).7. Ecker, C. et al. Describing the brain in autism in ve dimensions--magnetic resonance imaging-assisted diagnosis of autism spectrum disorder using a multiparameter classication approach. J. Neurosci. 30, 1061210623 (2010).

8. Ecker, C. et al. Investigating the predictive value of whole-brain structural MR scans in autism: a pattern classication approach. NeuroImage 49, 4456 (2010).

9. Uddin, L. Q. et al. Multivariate searchlight classication of structuralmagnetic resonance imaging in children andadolescents with autism. Biol. Psychiatry 70, 833841 (2011).

10. Anderson, J. S. et al. Functional connectivity magnetic resonance imaging classication of autism. Brain 134, 37423754 (2011).

11. Ingalhalikar, M., Parker, D., Bloy, L., Roberts, T. P. L. & Verma, R. Diffusion based abnormality markers of pathology: toward learned diagnostic prediction of ASD. NeuroImage 57, 918927 (2011).

12. Wang, H., Chen, C. & Fushing, H. Extracting multiscale pattern information of fMRI based functional brain connectivity with application on classication of autism spectrum disorders. PLoS ONE 7, e45502 (2012).

13. Deshpande, G., Libero, L. E., Sreenivasan, K. R., Deshpande, H. D. & Kana, R. K. Identication of neural connectivity signatures of autism using machine learning. Front. Hum. Neurosci. 7, 670 (2013).

14. Dosenbach, N. U. F. et al. Prediction of individual brain maturity using fMRI. Science 329, 13581361 (2010).

15. Tomasi, D. & Volkow, N. D. Gender differences in brain functional connectivity density. Hum. Brain Mapp. 33, 849860 (2011).

16. Klaassens, B. L. et al. Single-dose serotonergic stimulation shows widespread effects on functional brain connectivity. NeuroImage 122, 440450 (2015).17. Friedman, L., Glover, G. H. & The FBIRN Consortium. Reducing interscanner variability of activation in a multicenter fMRI study: controlling for signal-touctuation-noise-ratio (SFNR) differences. NeuroImage 33, 471481 (2006).

18. Thulborn, K. R., Chang, S. Y., Shen, G. X. & Voyvodic, J. T. High-resolution echo-planar fMRI of human visual cortex at 3.0 tesla. NMR Biomed. 10, 183190 (1997).

19. Van Dijk, K. R. A. et al. Intrinsic functional connectivity as a tool for human connectomics: theory, properties, and optimization. J. Neurophysiol. 103, 297321 (2010).

20. Whelan, R. & Garavan, H. When optimism hurts: inated predictions in psychiatric neuroimaging. Biol. Psychiatry 75, 746748 (2014).

21. Whelan, R. et al. Neuropsychosocial proles of current and future adolescent alcohol misusers. Nature 512, 185189 (2015).

22. Chen, H. et al. Multivariate classication of autism spectrum disorder using frequency-specic resting-state functional connectivitya multi-center study. Prog. Neuropsychopharmacol. Biol. Psychiatry 64, 19 (2016).

23. Iidaka, T. Resting state functional magnetic resonance imaging and neural network classied autism and control. Cortex 63, 5567 (2015).

24. Nielsen, J. A. et al. Multisite functional connectivity MRI classication of autism: ABIDE results. Front. Hum. Neurosci. 7, 599 (2013).

25. Plitt, M., Barnes, K. A. & Martin, A. Functional connectivity classication of autism identies highly predictive brain features but falls short of biomarker standards. Neuroimage Clin. 7, 359366 (2015).

26. Yoshihara, Y. et al. in The 17th Annual Meeting of the Organization on Human Brain Mapping (Quebec City, Canada, 2630 June, 2011).

27. Deen, B. & Pelphrey, K. Perspective: brain scans need a rethink. Nature 491, S20 (2012).

28. Walsh, P., Elsabbagh, M., Bolton, P. & Singh, I. In search of biomarkers for autism: scientic, social and ethical challenges. Nat. Rev. Neurosci. 12, 603612 (2011).

29. Kelly, C., Biswal, B. B., Craddock, R. C., Castellanos, F. X. & Milham, M. P. Characterizing variation in the functional connectome: promise and pitfalls. Trends Cogn. Sci. 16, 181188 (2012).

30. Cross-Disorder Group of the Psychiatric Genomics Consortium. Identication of risk loci with shared effects on ve major psychiatric disorders:a genome-wide analysis. Lancet 381, 13711379 (2013).

31. Geschwind, D. H. & Flint, J. Genetics and genomics of psychiatric disease. Science 349, 14891494 (2015).

32. King, B. H. & Lord, C. Is schizophrenia on the autism spectrum? Brain Res. 1380, 3441 (2011).

33. Pinkham, A. E., Hopnger, J. B., Pelphrey, K. A., Piven, J. & Penn, D. L. Neural bases for impaired social cognition in schizophrenia and autism spectrum disorders. Schizophr. Res. 99, 164175 (2008).

34. Adam, D. Mental health: on the spectrum. Nature 496, 416418 (2013).35. Goodkind, M. et al. Identication of a common neurobiological substrate for mental illness. JAMA Psychiatry 72, 305311 (2015).

36. Perrot, M., Rivire, D. & Mangin, J. F. Cortical sulci recognition and spatial normalization. Med. Image Anal. 15, 529550 (2011).

37. Witten, D. M., Tibshirani, R. & Hastie, T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515534 (2009).

38. Yamashita, O., Sato, M.-A., Yoshioka, T., Tong, F. & Kamitani, Y. Sparse estimation automatically selects voxels relevant for the decoding of fMRI activity patterns. NeuroImage 42, 14141429 (2008).

39. Noirhomme, Q. et al. Biased binomial assessment of cross-validated estimation of classication accuracies illustrated in diagnosis predictions. Neuroimage Clin. 4, 687694 (2014).

40. Di Martino, A. et al. The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Mol. Psychiatry 19, 659667 (2014).

41. Hahamy, A., Behrmann, M. & Malach, R. The idiosyncratic brain: distortion of spontaneous connectivity patterns in autism spectrum disorder. Nat. Neurosci. 18, 302309 (2015).

42. Tu, P.-C., Hsieh, J.-C., Li, C.-T., Bai, Y.-M. & Su, T.-P. Cortico-striatal disconnection within the cingulo-opercular network in schizophrenia revealed by intrinsic functional connectivity analysis: a resting fMRI study. NeuroImage 59, 238247 (2012).

43. Just, M. A., Keller, T. A., Malave, V. L., Kana, R. K. & Varma, S. Autismas a neural systems disorder: a theory of frontal-posterior underconnectivity. Neurosci. Biobehav. Rev. 36, 12921313 (2012).

44. Hughes, J. R. Autism: The rst rm nding underconnectivity? Epilepsy

Behav. 11, 2024 (2007).45. Welchew, D. E. et al. Functional disconnectivity of the medial temporal lobe in Aspergers syndrome. Biol. Psychiatry 57, 991998 (2005).

NATURE COMMUNICATIONS | 7:11254 | DOI: 10.1038/ncomms11254 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications 11

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms11254

Acknowledgements

This research was conducted under the Development of BMI Technologies for Clinical Application of the Strategic Research Program for Brain Sciences supported by Japan Agency for Medical Research and Development (AMED). This research was also partially supported by the grants Development of Biomarker Candidates for Social Behavior and Integrated Research on Neuropsychiatric Disorders of the Strategic Research Program for Brain Sciences and Grant-in-Aid for Scientic Research on Innovative Areas (Comprehensive Brain Science Network) by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan, Japan Society for the Promotion of Science (JSPS) KAKENHI (25461752 to N.Y.), Grant-in-Aid for JSPS Fellows (to K.S.) and NIH Research Project Grant Program (R01EY015980 to T.W. and R01MH091801 to Y.S.). We thank Masa-aki Sato, Okito Yamashita, Yu Takagi, Masahiro Yamashitaand Ayumu Yamashita for their insightful comments on an early draft. We also thank Saori C. Tanaka, Naho Ichikawa and Yujiro Yoshihara for their technical assistance in this study.

46. Courchesne, E. & Pierce, K. Why the frontal cortex in autism might be talking only to itself: local over-connectivity but long-distance disconnection. Curr. Opin. Neurobiol. 15, 225230 (2005).

47. Lord, C. et al. The autism diagnostic observation schedule-generic: a standard measure of social and communication decits associated with the spectrum of autism. J. Autism Dev. Disord. 30, 205223 (2000).

48. Lord, C., Rutter, M. & Le Couteur, A. Autism Diagnostic Interview-Revised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. J. Autism Dev. Disord. 24, 659685 (1994).

49. Uddin, L. Q. et al. Salience network-based classication and prediction of symptom severity in children with autism. JAMA Psychiatry 70, 869879 (2013).

50. Zou, H. & Hastie, T. Regularization and Variable Selection via the Elastic Net.J. R. Stat. Soc. Ser. B Stat. Methodol. 67, 301320 (2005).51. ATR Brain Information Communication Research Laboratory Group. DecNef Project. Available at http://www.cns.atr.jp/decnefpro/

Web End =http://www.cns.atr.jp/decnefpro/ (2016).

52. Shibata, K., Watanabe, T., Sasaki, Y. & Kawato, M. Perceptual learning incepted by decoded fMRI neurofeedback without stimulus presentation. Science 334, 14131415 (2011).

53. Megumi, F., Yamashita, A., Kawato, M. & Imamizu, H. Functional MRI neurofeedback training on connectivity between two regions induces long-lasting changes in intrinsic functional network. Front. Hum. Neurosci. 9, 160 (2015).

54. Hashimoto, R. in The 36th Meeting of Japan Neuroscience Society (Kyoto, Japan, 2023 June 2013).

55. Kawato, M. in The 11th World Congress of Biological Psychiatry (Kyoto, Japan, 2327 June 2013).

56. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders. 4th edn, Text Revision (DSM-IV-TR) (American Psychiatric Association, 2000).

57. Otsubo, T. et al. Reliability and validity of Japanese version of the mini-international neuropsychiatric interview. Psychiatry Clin. Neurosci. 59, 517526 (2005).

58. Matsuoka, K., Uno, M., Kasai, K., Koyama, K. & Kim, Y. Estimation of premorbid IQ in individuals with Alzheimers disease using Japanese ideographic script (Kanji) compound words: Japanese version of National Adult Reading Test. Psychiatry Clin. Neurosci. 60, 332339 (2006).

59. Oldeld, R. C. The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 9, 97113 (1971).

60. Wakabayashi, A., Baron-Cohen, S., Wheelwright, S. & Tojo, Y. The autism-spectrum quotient (AQ) in Japan: a cross-cultural comparison.J. Autism Dev. Disord. 36, 263270 (2006).61. Satterthwaite, T. D. et al. An improved framework for confound regression and ltering for control of motion artifact in the preprocessing of resting-state functional connectivity data. NeuroImage 64, 240256 (2013).

62. Power, J. D., Barnes, K. A., Snyder, A. Z., Schlaggar, B. L. & Petersen, S. E. Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. NeuroImage 59, 21422154 (2012).

63. Tzourio-Mazoyer, N. et al. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. NeuroImage 15, 273289 (2002).

64. Griffanti, L. et al. ICA-based artefact removal and accelerated fMRI acquisition for improved resting state network imaging. NeuroImage 95, 232247 (2014).

65. Mackay, D. J. C. Probable networks and plausible predictionsa review of practical Bayesian methods for supervised neural networks. Network Comp. Neural Syst. 6, 479505 (1995).

66. Neal, R. M. Bayesian Learning for Neural Networks 118 (Springer Science & Business Media, 1996).

67. Meinshausen, N. & Bhlmann, P. Stability selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 72, 417473 (2010).

68. Hotelling, H. Relations between two sets of variates. Biometrika 28, 321 (1936).69. Lever, A. G. & Geurts, H. M. Age-related differences in cognition across the adult lifespan in autism spectrum disorder. Autism Res. doi: 10.1002/aur.1545 (2015).

70. Lehnhardt, F.-G. et al. Sex-related cognitive prole in autism spectrum disorders diagnosed late in life: implications for the female autistic phenotype.J. Autism Dev. Disord. 46, 139154 (2015).

Author contributions

N.Y., R.H., K.S., Y.S., T.W. and M.K. conceived and designed the study. N.Y., R.H., Y.K., H.K., M.K., T.Y., F.M., H.I., K.K., N.K., H.T. and Y.O. recruited participants of the study and collected their clinical and imaging data, and Y.K., H.K., M.Ku., T.Y., K.K., H.T., Y.O. and N.K. were in charge of the clinical assessments. N.Y., J.M., R.H., G.L. and K.S. carried out the analysis of the data. N.Y., J.M., R.H., G.L., K.S., J.N., Y.S., T.W. and M.K. wrote the manuscript.

Additional information

Supplementary Information accompanies this paper at http://www.nature.com/naturecommunications

Web End =http://www.nature.com/ http://www.nature.com/naturecommunications

Web End =naturecommunications

Competing nancial interests: M.K., J.M., N.Y., R.H., F.M., K.S., H.I., T.W., Y.S., N.K. and K.K are inventors of a patent owned by Advanced Telecommunications Research (ATR) Institute International related to the present work [PCT/JP2014/061543 (WO2014178322)]. J.M., M.K., N.Y., R.H., K.S., T.W., Y.S., N.K. and K.K. are inventors of a patent owned by ATR Institute International related to the present work [PCT/JP2014/061544 (WO2014178323)]. G.L., J.M., M.K. and N.Y. are inventors of a patent application submitted by ATR Institute International related to the present work [JP2015-228970]. H.K. received honoraria for lectures related to the Japanese version of ADI-R from Kanekoshobo. M.Ku. was previously involved in the publication of the Japanese version of ADOS and ADI-R. M.Ku. also received honoraria for lectures related to the Japanese version of ADI-R and ADOS from Kanekoshobo. H.I. received honoraria for lectures by Shimadzu Co. and Nippon Boehringer Ingelheim Co., Ltd. H.T. received research grants from Takeda. H.T. has received honoraria for lectures by Otsuka, Meiji Seika, MSD, Dainippon-Sumitomo and GlaxoSmithKline. For the past three years, Y.O. declare the following potential conicts of interest, although they are all unrelated to the current study. Y.O. has received honoraria for lectures by Otsuka, Dainippon Sumitomo, Astellas, Pzer, Eli Lilly, Janssen, Meiji Seika Pharma, Mochida, Yoshitomi Yakuhin, Eizai and GlaxoSmithKline. For the past three years, K.K. declare the following potential conicts of interest, although they are all unrelated to the current study. K.K. received research grants from Astellas, GlaxoSmithKline, Dainippon-Sumitomo, Eisai, MSD and Yoshitomi. K.K. has received honoraria for lectures by Daiichi-Sankyo, Otsuka, Meiji Seika, MSD, Astellas, Yoshitomi, Novartis, Eli Lilly, Dainippon-Sumitomo, Janssen, GlaxoSmithKline and Pzer.

Reprints and permission information is available online at http://npg.nature.com/reprintsandpermissions/

Web End =http://npg.nature.com/ http://npg.nature.com/reprintsandpermissions/

Web End =reprintsandpermissions/

How to cite this article: Yahata, N. et al. A small number of abnormal brain connections predicts adult autism spectrum disorder. Nat. Commun. 7:11254 doi: 10.1038/ncomms11254 (2016).

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the articles Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Web End =http://creativecommons.org/licenses/by/4.0/

12 NATURE COMMUNICATIONS | 7:11254 | DOI: 10.1038/ncomms11254 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications

Word count: 14823

Show less

Abstract

Translate

Although autism spectrum disorder (ASD) is a serious lifelong condition, its underlying neural mechanism remains unclear. Recently, neuroimaging-based classifiers for ASD and typically developed (TD) individuals were developed to identify the abnormality of functional connections (FCs). Due to over-fitting and interferential effects of varying measurement conditions and demographic distributions, no classifiers have been strictly validated for independent cohorts. Here we overcome these difficulties by developing a novel machine-learning algorithm that identifies a small number of FCs that separates ASD versus TD. The classifier achieves high accuracy for a Japanese discovery cohort and demonstrates a remarkable degree of generalization for two independent validation cohorts in the USA and Japan. The developed ASD classifier does not distinguish individuals with major depressive disorder and attention-deficit hyperactivity disorder from their controls but moderately distinguishes patients with schizophrenia from their controls. The results leave open the viable possibility of exploring neuroimaging-based dimensions quantifying the multiple-disorder spectrum.

Details

Title

A small number of abnormal brain connections predicts adult autism spectrum disorder

Author

Yahata, Noriaki; Morimoto, Jun; Hashimoto, Ryuichiro; Lisi, Giuseppe; Shibata, Kazuhisa; Kawakubo, Yuki; Kuwabara, Hitoshi; Kuroda, Miho; Yamada, Takashi; Megumi, Fukuda; Imamizu, Hiroshi; Náñez Sr, José E; Takahashi, Hidehiko; Okamoto, Yasumasa; Kasai, Kiyoto; Kato, Nobumasa; Sasaki, Yuka; Watanabe, Takeo; Kawato, Mitsuo

Pages

11254

Publication year

2016

Publication date

Apr 2016

Publisher

Nature Publishing Group

e-ISSN

20411723

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1038/ncomms11254

ProQuest document ID

1780818579

A small number of abnormal brain connections predicts adult autism spectrum disorder

Jump to:

Full text

Abstract

Details

Suggested sources