Introduction
Based on current data, 1 in 31 children in the United States are diagnosed with autism by age eight1. While early intervention is associated with the greatest benefits, many children experience multi-year delays to diagnosis. Despite reliable diagnosis being possible by 18 months2, the average age of diagnosis is currently five years3. For girls, delays are even greater, with the average age of diagnosis sitting at 5.6 years3. Over-reliance on a dwindling specialist workforce4,5 has contributed to delayed evaluations, as has routine use of time intensive assessments irrespective of case complexity6,7. A recent survey of autism speciality centers across the U.S.8 found that nearly two-thirds of specialty centers (61%) have wait times longer than 4 months. Of that group, 25% have waitlists of more than half a year. 21% report waitlists of more than a year, or waitlists so full that they can no longer take new referrals. The same survey found that in the majority of centers (83%), evaluations take more than three hours, with evaluations extending up to 8 hours in a quarter of centers.
There is a growing call to expand the pool of clinicians able to conduct evaluations, as well as a recognized need to streamline the evaluation process itself, so that more children can be diagnosed equitably and accurately early on9,10. Multiple randomized controlled trials show that timely access to targeted early interventions leads to significantly greater cognitive, linguistic, and functional gains for children with autism, compared to lack of treatment, delayed treatment, or non-targeted treatment11. Even minor delays to treatment initiation have been shown to negatively impact outcomes, for example, starting therapies at 27 months versus 18 months of age12.
In response to this need for streamlined early diagnosis, Canvas Dx was developed and validated prospectively to empower a broader pool of clinicians to act rapidly upon first developmental concerns13. The first FDA-authorized diagnostic for autism of any kind14, Canvas Dx uses AI-technology based on data from thousands of diverse children at risk for and with developmental delays, including autism. Device inputs were designed to capture behavioral, executive functioning, language and communication features maximally predictive of autism. Consistent with best practice recommendations that evaluation for autism include both caregiver and clinician input, as well as direct observation of the child15, Canvas Dx integrates data from multiple sources (see Fig. 1) in its machine learning algorithm.
Fig. 1 [Images not available. See PDF.]
After downloading the Canvas Dx App on their smartphone, the child’s caregiver answers a brief question set about the child’s behavior and development (5 min). The caregiver also uploads two brief (1.5–5 min) videos of their child playing via the App. Videos undergo analysis and feature extraction. The child’s clinician answers a set of questions via the Canvas Dx clinician web portal (10 min). All inputs are fed through the machine learning algorithm. An output of positive, negative or indeterminate for autism is returned, along with an auto-generated detailed report mapping challenges to DSM-5 criteria relevant to autism diagnosis. The image depicts actors not real study participants. Image copyright Cognoa Inc.
The device provides a positive or negative autism prediction in the majority of cases, as well as a detailed report for each child that helps identify developmental strengths and challenges, and maps data to DSM-5 autism criteria to better inform next steps. In cases where there is insufficient information to confidently provide a diagnostic prediction or rule out with high accuracy, the device produces an ‘indeterminate’ output. This diagnostic abstention mechanism allows for safer uncertainty management in cases where misclassification risks are the highest16. Explainable AI and the management of uncertainty has become central to AI in healthcare16,17. Arbitrary cut-offs that result in a binary classification are subject to error at the edge cases, particularly in the field of autism where ambiguous presentations or multiple co-occurring conditions increase misclassification risk in binary screeners18,19. Having an indeterminate range or abstention feature may support greater clinician accuracy when evaluating complex autism cases, just as a clinician is able to say “I don’t know” when uncertain, AI-based devices are likely to operate more safely and transparently when they are not forced to produce a binary prediction in all cases20.
Based on clinical trial data13, in a study environment with an underlying autism prevalence of 29%, the device achieved a Positive Predictive Value (PPV) for autism of 80.8% (95% CI, 70.3–88.8) and a Negative Predictive Value (NPV) of 98.3% (95% CI, 90.6–100.0). Given examples of interventions failing to perform with equal accuracy outside of clinical trial settings,21 and an underperformance of AI models in real-world settings in particular22, the purpose of this analysis was to determine how Canvas Dx is performing in real world settings, and to learn more about its impact on age of diagnosis, as well as the characteristics of device prescribers and patient users. Analysis of AI model performance in real-world contexts is a critical step towards ensuring safe and impactful clinical adoption22.
Methods
A de-identified aggregate data analysis of the initial 254 Canvas Dx prescriptions fulfilled in clinical settings post-market authorization was conducted to determine: what proportion of children received a determinate device output (positive or negative for autism); device PPV, NPV, sensitivity, and specificity compared to clinical reference standard; and key prescriber and patient characteristics. Real world performance metrics were then compared to previously published clinical trial device performance.
Sample: All patients who were prescribed Canvas Dx and completed all inputs needed to get a diagnostic result were included in this analysis. All patients were in the intended use population of the device, children 18 to 72 months of age with caregiver or health provider concern for developmental delay.
Ethics: The de-identified real world aggregate data analysis (PR015) was determined exempt by Advarra IRB. The previously published Canvas Dx clinical study protocol referenced in this analysis, and informed consent forms were reviewed and approved by a centralized Institutional Review Board (IntegReview IRB). Protocol Number: Q170886. IntegReview IRB granted approval of the study (protocol version 1.0) on 19 July 2019. IntegReview was subsequently purchased by Advarra IRB. Informed consent was obtained from all caregivers whose children participated in the clinical study. This study was registered on ClinicalTrials.gov (NCT04151290) prior to study initiation. All clinical study methods were carried out in accordance with relevant guidelines and regulations.
Real world data analysis
Clinical reference standard procedure
As part of its obligation to conduct continuous algorithmic performance monitoring, the device manufacturer tracks Canvas Dx performance against a panel of blinded, independent, board-certified child and adolescent psychiatrists, child neurologists, developmental-behavioral pediatricians, or child psychologists with more than 5 years experience in diagnosing autism. Two specialists, blinded to the device results and to the diagnostic call of their peer, evaluate the device inputs and determine if autism and/or other neurodevelopmental conditions are present based on DSM-5 criteria. In cases where the two specialists disagree, a third specialist (also blinded) reviews the data, and the majority decision determines the clinical reference standard diagnosis.
Statistical analysis of device performance
The determinate rate was calculated as the proportion of prescriptions for which the device predicted positive or negative for autism, as opposed to abstaining. Because the device is not a binary classifier, abstention cases were analyzed separately from determinate cases. For determinate cases, PPV, NPV, sensitivity and specificity were calculated with the clinical reference standard consensus diagnosis for each case used as the true label. The corresponding 95% confidence intervals were generated for each metric. Fisher’s Exact Test was used to determine whether there was a statistically significant difference in device performance between biological sex or age range for each of these metrics. As abstention cases represent neither a correct nor incorrect classification, sensitivity and specificity are not reported in the indeterminate sample. Instead, we calculated the percentage of indeterminate cases that received a positive and negative reference standard autism diagnosis, as well as the percent that were indicated as being at risk for other neurodevelopmental conditions. These analyses were conducted on the indeterminate group as a whole, as well as on subsets of the indeterminate group stratified into low, moderate, and high autism risk. These risk groupings were derived by examining the distribution of positive and negative reference standard diagnoses across the range of device scores within the indeterminate zone. Score ranges that resulted in the lowest and highest observed prevalence of autism were assigned to the low- and high-risk groups respectively, and the middle range group was selected to maximize the separation in autism prevalence across the three categories.
Analysis of decision thresholds
To examine the impact of decision thresholds on performance the PPV, NPV, sensitivity, and specificity of the device were calculated for a range of decision thresholds that resulted in determinate rates between 20% and 100%. The range of decision thresholds were selected by adjusting both the positive and negative threshold boundaries from the true Device thresholds to achieve specific determinate rates. The determinate rates at which each performance metric becomes significantly different from the real world Device performance were calculated.
Comparison to clinical trial data
Calculated real world performance metrics were then compared to clinical trial data to ensure that there was no degradation in device performance between clinical and real world settings, using Fisher’s Exact Test. Full details of the methodology used to derive the clinical trial performance metrics are described in previously published work.23.
Results
Real world data analysis
Prescriber characteristics
At the time of data analysis, 100 unique prescribers had Canvas Dx prescriptions fulfilled. Prescribers were located in 20 different states, across 40 practices. The highest number of prescriptions were generated in California (68), Virginia (43), and Florida (42). Breakdown of prescriber qualifications is included in Fig. 2.
Fig. 2 [Images not available. See PDF.]
Prescriber qualifications.
Patient characteristics
Based on clinical reference standard determination, the underlying autism prevalence in the sample was 54.7% (139/254). Over a quarter of the sample, 29.13% (74/254), were female. The median age of children evaluated with Canvas Dx was 37.2 months (range: 17.1–71.8 months). The median age of children who received a positive output was 33.7 months (range: 17.1–69.7 months).
Table 1 presents the demographic and clinical characteristics of the full study population, the population with a Negative ASD reference standard, and the population with a Positive ASD reference standard. Fisher’s Exact Test was used to assess whether there were statistically significant differences between the positive and negative ASD groups for each characteristic.
Table 1. Patient characteristics stratified by reference standard diagnosis.
Characteristic | Full Population (n = 254) | Negative ASD Patients (n = 115) | Positive ASD Patients (n = 139) | Fisher’s Exact Test p value |
---|---|---|---|---|
Female | 29.1% | 30.4% | 28.1% | 0.68 |
Under 48 months old | 66.1% | 56.5% | 74.1% | 0.004* |
Had noted risk of at least one neurodevelopmental condition other than Autism | 86.2% | 83.5% | 88.5% | 0.28 |
Device performance
More than half of users (62.99%) received a determinate result (CI- 57.05% − 68.93%). For determinate cases, compared to the reference standard, Canvas Dx had an NPV of 97.56% (CI- 92.84% − 100.0%) and a PPV of 92.44% (CI- 87.69% − 97.19%). Sensitivity and specificity were 99.1% (CI- 97.34% − 100.0%) and 81.63% (CI- 70.79% − 92.47%) respectively. Autism prevalence rates in the indeterminate group are displayed in Table 3. Data regarding the prescribing clinician’s final diagnoses were available for 41.1% of the 95 indeterminate cases. In the majority of these cases (76.9%), the prescribing clinician was in agreement with the reference standard diagnosis (21 positive cases and 9 negative cases). For the 23.1% of cases with disagreement between the prescribing clinician and the reference standard diagnosis, the majority received a clinician positive diagnosis and negative reference standard (6 cases) while the rest received a clinician negative diagnosis and positive reference standard (3 cases).
Table 2 presents a contingency table comparing the reference standard diagnosis (Positive or Negative for ASD) to the device result (Positive, Indeterminate, or Negative). Counts reflect the number of cases falling into each combination of reference standard and device outcome.
Table 2. Contingency table.
Positive reference standard | Negative reference standard | Total | |
---|---|---|---|
Positive device result | 110 | 9 | 119 |
Indeterminate device result | 28 | 66 | 95 |
Negative device result | 1 | 40 | 41 |
Total | 139 | 115 | 254 |
Table 3 presents the percentage of individuals within the indeterminate device result group who received an autism diagnosis or had at least one documented risk factor for a neurodevelopmental condition other than autism. The data are stratified by autism risk level assigned within the indeterminate group: low, moderate, and high.
Table 3. Indeterminate autism risk group analysis.
All Indeterminates (n = 94) | Low Autism Risk Indeterminates (n = 32) | Moderate Autism Risk Indeterminates (n = 46) | High Autism Risk Indeterminates (n = 16) | |
---|---|---|---|---|
Received autism diagnosis | 29.8% | 12.5% | 32.6% | 56.3% |
Had noted risk of at least one neurodevelopmental condition other than Autism | 88.3% | 90.6% | 84.8% | 94.1% |
Device performance by biological sex
For determinate cases there were no statistically significant differences in device performance between males and females at the 0.05 p value level. The rate at which the device produced a determinate versus indeterminate result was also statistically insignificant at the 0.05 p value level (see Table 4).
Table 4. Device performance by biological sex.
Metric | Male | Female | Fisher’s Exact Test p value |
---|---|---|---|
PPV | 95.2% (n = 84) | 85.7% (n = 35) | 0.12 (n = 119) |
NPV | 96.4% (n = 28) | 100.0% (n = 13) | 1.0 (n = 41) |
Specificity | 87.1% (n = 31) | 72.2% (n = 18) | 0.26 (n = 49) |
Sensitivity | 98.8% (n = 81) | 100.0% (n = 30) | 1.0 (n = 111) |
Determinate Rate | 62.2% (n = 180) | 64.9% (n = 74) | 0.78 (n = 254) |
Device performance by age
There were no statistically significant differences in NPV, sensitivity, specificity or determinate rate between the over 48 months of age and the under 48 months of age groups. The device had a statistically significant difference in PPV performance between age groups, with cases under 48 months of age achieving superior PPV (see Table 5).
Table 5. Device performance by age group.
Metric | Age under 48 months | Age over 48 months | Fisher’s exact test p value |
---|---|---|---|
PPV | 96.3% (n = 82) | 83.8% (n = 37) | 0.025* (n = 119) |
NPV | 96.6% (n = 29) | 100.0% (n = 12) | 1.0 (n = 41) |
Specificity | 90.3% (n = 31) | 66.7% (n = 18) | 0.058 (n = 49) |
Sensitivity | 98.8% (n = 80) | 100.0% (n = 31) | 1.0 (n = 111) |
Determinate Rate | 66.1% (n = 168) | 57.0% (n = 86) | 0.17 (n = 254) |
*Significant difference.
Impacts of threshold adjustments on device performance
Fig. 3 [Images not available. See PDF.]
Fig. 3 illustrates the best device performance line that represents the theoretic determinate rate at which all accuracy metrics are maximized.
Figure 3 Impact of adjusting abstention thresholds: this figure demonstrates the change in PPV, NPV, sensitivity, and specificity as the abstention thresholds are adjusted to allow for a range of determinate rates. The Best Device Performance line represents the theoretic determinate rate at which all accuracy metrics are maximized. The Selected Determinate Rate line represents the current real world device performance with the abstention thresholds used in this study. The Significant Determinate Increase line represents the point at which the determinate rate becomes statistically improved over the current real world device determinate rate. All other lines represent the point at which an accuracy metric statistically significantly decreases from real world performance.
Real world device performance comparison to clinical trial results
The demographic composition of our real world and clinical trial samples are included in Table 6.
Table 6. Clinical trial population characteristics vs. real world population characteristics.
Characteristic | Clinical trial patients (n = 425) | Real world patients (n = 254) | Fisher’s Exact Test p value (n = 679) |
---|---|---|---|
Female | 36.2% | 29.1% | 0.06 |
Under 48 Months of Age | 68.0% | 66.1% | 0.67 |
Autism prevalence | 28.5% | 54.7% | < 0.001* |
* Significant Difference.
Across all cases, PPV improved to a significant degree in real world performance. This improvement was driven by statistically significant improvements to PPV in the female and under 48 months of age demographics. Real world PPV performance for male and over 84 months of age demographics were equivalent to clinical trial performance. Real world NPV performance was equivalent to clinical trial performance across all demographics. The real world determinate rate was significantly improved when compared to the clinical trial determinate rate across all demographics (see Table 7). The sample of real world patients reflects the composition of the clinical trial sample for age and gender, though the real world patient sample had a significantly higher autism prevalence. This increased prevalence may drive some of the significant improvements to PPV, and the decreases in NPV.
Table 7. Clinical trial device performance vs. real world performance.
Metric | Demographic | Clinical trial performance | Real world performance | Fisher’s Exact Test p value |
---|---|---|---|---|
PPV | All Demographics | 80.8% (n = 78) | 92.4% (n = 119) | 0.024* (n = 197) |
Female | 60.0% (n = 20) | 85.7% (n = 35) | 0.048* (n = 55) | |
Male | 87.9% (n = 58) | 95.2% (n = 85) | 0.12 (n = 142) | |
Under 48 months of age | 81.5% (n = 65) | 96.3% (n = 82) | 0.005* (n = 147) | |
Over 48 months of age | 76.9% (n = 13) | 83.8% (n = 37) | 0.68 (n = 50) | |
NPV | All Demographics | 98.2% (n = 57) | 97.6% (n = 41) | 1.0 (n = 98) |
Female | 96.0% (n = 25) | 100.0% (n = 13) | 1.0 (n = 38) | |
Male | 100.0% (n = 32) | 96.4% (n = 28) | 0.47 (n = 60) | |
Under 48 months of age | 97.8% (n = 45) | 96.6% (n = 29) | 1.0 (n = 74) | |
Over 48 months of age | 100.0% (n = 12) | 100.0% (n = 12) | 1.0 (n = 24) | |
Determinate rate | All Demographics | 31.8% (n = 425) | 63.0% (n = 254) | < 0.001* (n = 679) |
Female | 29.2% (n = 154) | 64.9% (n = 74) | < 0.001* (n = 228) | |
Male | 33.2% (n = 451) | 62.2% (n = 180) | < 0.001* (n = 451) | |
Under 48 months of age | 38.1% (n = 289) | 66.1% (n = 168) | < 0.001* (n = 457) | |
Over 48 months of age | 18.4% (n = 136) | 57.0% (n = 86) | < 0.001* (n = 222) |
*Significant difference.
Discussion
Principal results
In this analysis of real-world Canvas Dx use, the device provided highly accurate positive and negative outputs for autism that aligned with the specialist reference standard in the majority of cases. In a patient population with an autism prevalence of 54.9%, Canvas Dx, had a high NPV (97.56%) and PPV (92.44%), providing a determinate output for 62.75% of children. Children in this analysis were provided a positive output more than 2 years (26.3 months) earlier than the current average age of autism diagnosis in the United States3. This finding highlights the substantial waitlist reductions that could be made by streamlining evaluations and recruiting a broader range of clinicians to participate in the autism evaluation process. Currently, the U.S has only 758 developmental-behavioral pediatricians for 19 million kids with developmental or learning challenges4 and 11 child and adolescent psychiatrists for every 100,000 children5. By empowering more clinicians to participate in autism evaluations, Canvas Dx can help to support definitive early action for a greater subset of children. Earlier answers, in turn, may enable initiation of targeted interventions during the critical early years of high brain neuroplasticity when they have the greatest impact.
While device performance was consistent across biological sex for all metrics and across age groups for most metrics, PPV performance differed between older and younger age groups. Comparison of these real world results to clinical trial results23 suggests that this difference in PPV performance is due to substantially improved device performance in the younger age group, rather than degraded performance in the older age group. While girls only comprised 29.02% of the sample analyzed here, they represented 30.0% of children who received a determinate result, indicating proportional representation in determinate results across sexes. This is a finding of critical importance given the existing inequities in autism diagnoses for girls in the U.S3,24.
Economic and societal impacts
Robust data across numerous published studies support both the short and long term health and economic benefits of diagnosing children with autism earlier, so that treatments can begin in the critical neurodevelopmental window where they have the greatest impact25. A U.S. analysis of the potential medical and residential cost savings that could be realized with earlier initiation of evidence-based therapies for children with autism, projects annual cost savings in excess of $23.8 billion, with savings of ~$8.5 billion and $2.6 billion in Federal Medicaid and State Medicaid spending respectively26. Canadian lifetime cost-effectiveness modeling per person with autism based on eliminating their current 32 month wait time for intensive behavioral intervention (IBI) initiation found substantial government ($53,000 per person) and society ($267,000 per person) savings27.
Cost savings are realized not only in the post-diagnostic period, but also through reduction of unnecessary or untargeted treatments and poorly managed symptoms in the period between first concern and eventual diagnosis. A large US claims analysis28 for ~ 9000 children with autism, for example, found that the mean all-cause medical cost per child was ~ 2x higher for those with longer time from first concern to diagnosis compared with those with a shorter time delay ($5,268 vs. $2,525 per child in the younger age cohort and $5,570 vs. $2,265 per child in the older age cohort). Children who had a longer delay to diagnosis also experienced a greater number of both all-cause and autism-related health care visits compared with children who had a shorter delay. For example, the mean and median number of office or home visits were between 1.5x and 2x greater among children who had a longer time from concern to diagnosis.28.
Limitations
Only data captured as part of routine device use were available for the real world analysis therefore we were unable to comment on subjective patient and provider experiences, satisfaction measures, or longitudinal diagnostic stability. Similarly, information on patient race/ethnicity and socio-economic status are not collected as part of routine clinical device use, therefore we could not conduct covariate analysis on these features. Pivotal trial results, however, did point to equitable device performance across race/ethnicity and socio-economic status23. More information on device performance across these covariates is currently being collected as part of a primary care integration study29.
In 37% of cases, the Device abstained from making an autism prediction or rule out. As Fig. 3 demonstrates, adjusting determinate thresholds impacts both abstention and accuracy. Restricting determinate outputs to the 63.0% of cases with sufficient certainty prevents the degradation of device performance that is seen when adjusting abstention thresholds to allow for larger determinate rates. Increasing the determinate rate to 72.0% results in a statistically significant improvement in determinate rate over current real world performance (Fisher’s Exact Test p value 0.047) without any statistically significant decrease in accuracy metrics. The determinate rate can be further increased to 81.4% while maintaining statistically equivalent accuracy metrics. At this point, PPV drops significantly (Fisher’s Exact Test p value 0.039), and specificity decreases to a clinically significant degree though it maintains statistical equivalence. At this point, the number of indeterminates decreases from 95 to 49 cases, while the number of False Positives increases from 9 to 22 cases and the number of False Negatives increases from 1 to 4 cases. The number of True Positives increase from 110 to 120 cases, and True Negatives increase from 40 to 59 cases. The determinate rate can then be increased up to 94.7% before both PPV and sensitivity drop statistically significantly (Fisher’s Exact Test p value 0.042). While both NPV and specificity remain statistically equivalent to current real world performance, both metrics experience clinically significant decreases. Specificity and NPV performance are statistically maintained up to a 100% determinate rate.The real world device PPV remains statistically superior or equivalent to clinical trial performance at a 100% determinate rate.
All four metrics can achieve 100% performance, but this can only be realized by lowering the determinate rate to 20.9%. Though the number of False Positive and False Negatives decrease to 0, the number of indeterminate cases rises from 95 to 198. True Positives decrease from 110 to 40 cases, and True Negatives decrease from 40 cases to 15. Restricting the determinate rate to cases with an even higher certainty would further improve device performance, but with the trade-off of providing fewer children with a determinate result. With a determinate rate of 52.97%, the number of children provided with a determinate result would be significantly decreased (Fisher’s Exact Test p value 0.047). The selected thresholds for this device represent the theoretic determinate rate at which all accuracy metrics are maximized.
While allowing for a 37% abstention rate is arguably a limitation of the device, it aligns with calls from clinicians and statisticians alike to consider machine learning abstention in complex edge cases16,17. Abstention in such cases may represent a preferred method for addressing high uncertainty because it both minimizes misclassification and highlights challenging cases that may need further investigation18, 19–20,30. This is particularly critical for conditions such as autism where consequences of misclassification include a potential failure to receive treatment during the window of peak brain neuroplasticity. As demonstrated in Fig. 3, the selected Canvas Dx abstention thresholds were chosen to preserve device performance while providing determinate results to as many cases as can be classified with high certainty, though device performance would remain clinically useful at much lower abstention rates. For indeterminate cases, clinicians are still given access to the full Canvas Dx detailed report that includes DSM-5 patient specific mapping. In this real world analysis we observed that for the majority of cases where the prescriber rendered a diagnostic call or rule out for indeterminate cases, it aligned with the blinded reference standard call. While this analysis demonstrates high device accuracy in real world settings, and earlier average age of autism diagnosis with related potential cost savings, its full impact will likely not be felt until payors clarify how reimbursement will be achieved through comprehensive medical policy coverage. The AAP leadership’s recent prioritization of advocacy efforts to ensure primary care providers throughout the country can have their autism diagnoses recognised9, suggests a potential acceleration of clinical adoption may occur in the near future.
Conclusions
This analysis of 254 Canvas Dx uses highlighted device accuracy, feasibility and utility across a variety of real-world contexts. Reducing the proportion of children requiring speciality referral and time intensive evaluations are critical steps towards the goal of tackling diagnostic delays and getting children into the right services sooner. Future longitudinal research quantifying the extent of pre and post-diagnostic cost savings associated with early streamlined diagnosis is recommended.
Author contributions
All authors contributed to the conception, design and drafting of the manuscript. Additionally, DPW contributed to data analysis and revised the manuscript critically for important intellectual content. JR contributed substantially to the acquisition and interpretation of data. CS contributed substantially to writing the first manuscript draft and critically interpreting the data. KH led the data analysis and drafted, reviewed, and revised the statistical methods and results. All authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.
Data availability
Data are not publicly available because they contain sensitive patient information. Individual, de-identified, participant data that underlie the results reported in this article and study protocol may be made available to qualified researchers upon reasonable request. Proposals should be directed to [email protected] to gain access. Data requestors will be required to sign a data-sharing agreement prior to access. The full study protocol for the clinical trial is available on ClinicalTrials.gov.
Competing interests
KH and CS are employees of Cognoa. CS also holds Cognoa stock options. JR is an employee of Autism Path 2 Care and is affiliated with the Society of Developmental and Behavioral Pediatrics. DPW is the co-founder of Cognoa, is on the board of directors, and holds Cognoa stock.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1. Shaw, K. A. et al. Prevalence and early identification of autism spectrum disorder among children aged 4 and 8 Years — autism and developmental disabilities monitoring network, 16 sites, united states ,2022. MMWR. Surveill. Summar.2025, 1–22. https://doi.org/10.15585/mmwr.ss7402a1 (2025).
2. Pierce, K et al. Evaluation of the diagnostic stability of the early autism spectrum disorder phenotype in the general population starting at 12 months. JAMA Pediatr.; 2019; 173, pp. 578-587. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31034004][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547081]
3. Autism Speaks. Autism by the Numbers: 2023 Inaugural Annual Report (2023). https://www.autismspeaks.org/sites/default/files/ABN_Annual_Report_2023.pdf.
4. Leshner, C. US is facing a shortage of developmental specalists (2023). https://www.wcnc.com/article/news/health/shortage-health-developmental-pediatrician-united-states-charlotte/275-6bd709e8-735b-4ac2-9a7e-4790b4b0cf05.
5. American Academy of Child & Adolescent Psychiatry. AACAP Releases Workforce Maps Illustrating Severe Shortage of Child and Adolescent Psychiatrists (2018).
6. Kaufman, N. K. Rethinking gold standards and best practices in the assessment of autism. Appl. Neuropsychol. Child.2020, 1–12. https://doi.org/10.1080/21622965.2020.1809414 (2020).
7. Gwynette, MF et al. Overemphasis of the autism diagnostic observation schedule (ADOS) evaluation subverts a clinician’s ability to provide access to autism services. J. Am. Acad. Child. Adolesc. Psychiatry; 2019; 58, pp. 1222-1223. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31356862]
8. Kraft, C. Wait times and processes for autism diagnostic evaluations: a first report survey of autism centers in the US. (2023).
9. American Academy of Pediatrics. Top 10 Leadership Resolutions (2023).
10. Barbaresi, W et al. Clinician diagnostic certainty and the role of the autism diagnostic observation schedule in autism spectrum disorder diagnosis in young children. JAMA Pediatr.; 2022; 176, pp. 1233-1241. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36251287][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9577880]
11. Gabbay-Dizdar, N et al. Early diagnosis of autism in the community is associated with marked improvement in social symptoms within 1–2 years. Autism; 2022; 26, pp. 1353-1363. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34623179]
12. Guthrie, W. et al. The earlier the better: an RCT of treatment timing effects for toddlers on the autism spectrum. Autism13623613231159153, 523 (2023).
13. Wall, DP; Liu-Mayo, S; Salomon, C; Shannon, J; Taraman, S. Optimizing a de Novo artificial intelligence-based medical device under a predetermined change control plan: improved ability to detect or rule out pediatric autism. Intell. -Based Med.; 2023; 8, 100102.
14. U.S & Food & Drug Administration. FDA Authorizes Marketing of Diagnostic Aid for Autism Spectrum Disorder (2021). https://www.fda.gov/news-events/press-announcements/fda-authorizes-marketing-diagnostic-aid-autism-spectrum-disorder.
15. Brian, JA; Zwaigenbaum, L; Ip, A. Standards of diagnostic assessment for autism spectrum disorder. Paediatr. Child. Health; 2019; 24, pp. 444-451. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31660042][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6812299]
16. Kompa, B; Snoek, J; Beam, AL. Second opinion needed: communicating uncertainty in medical machine learning. NPJ Digit. Med.; 2021; 4, pp. 1-6.
17. Cortes, C., DeSalvo, G., Gentile, C., Mohri, M. & Yang, S. Online learning with abstention. Int. Conf. Mach. Learn.2025, 1059–1067 (2025).
18. Charman, T; Gotham, K. Measurement issues: screening and diagnostic instruments for autism spectrum disorders–lessons from research and practise. Child. Adolesc. Ment Health; 2013; 18, pp. 52-63. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/23539140]
19. Roberts, MY et al. Beyond pass-fail: examining the potential utility of two thresholds in the autism screening process. Autism Res.; 2019; 12, pp. 112-122.2019mwsa.book...R [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30556302]
20. Gandouz, M; Holzmann, H; Heider, D. Machine learning with asymmetric abstention for biomedical decision-making. BMC Med. Inf. Decis. Mak.; 2021; 21, pp. 1-11.
21. Franklin, J. & Schneeweiss, S. When and how can real world data analyses substitute for randomized controlled trials? Clin Pharmacol. Ther102, 1452 (2017).
22. Longhurst, C., Singh, K., Chopra, A., Atreja, A. & Brownstein, J. A Call for Artificial Intelligence Implementation Science Centers to Evaluate Clinical Effectiveness. NEJM AI (2024).
23. Megerian, JT et al. Evaluation of an artificial Intelligence-Based medical device for diagnosis of autism spectrum disorder. Nat. Partn. J. - Digit. Med.; 2022; [DOI: https://dx.doi.org/10.1038/s41746-022-00598-6]
24. Aylward, B. S., Gal-Szabo, D. E., Taraman, S. & Racial Ethnic, and sociodemographic disparities in diagnosis of children with autism spectrum disorder. J. Dev. Behav. Pediatr. JDBP (2021).
25. Towle, P. O., Patrick, P. A., Ridgard, T., Pham, S. & Marrus, J. Is earlier better? The relationship between age when starting early intervention and outcomes for children with autism spectrum disorder: a selective review. Autism Res. Treat. (2020).
26. Frazier, TW et al. Evidence-based use of scalable biomarkers to increase diagnostic efficiency and decrease the lifetime costs of autism. Autism Res.; 2021; 14, pp. 1271-1283.2021stpu.book...F [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33682319][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8251791]
27. Piccininni, C; Bisnaire, L; Penner, M. Cost-effectiveness of wait time reduction for intensive behavioral intervention services in ontario, Canada. JAMA Pediatr.; 2017; 171, pp. 23-30. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/27842183]
28. Vu, M et al. Increased delay from initial concern to diagnosis of autism spectrum disorder and associated health care resource utilization and cost among children aged younger than 6 years in the united States. J. Manag Care Spec. Pharm.; 2023; 29, pp. 378-390. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36989447]
29. Sohl, K et al. Feasibility and impact of integrating an artificial Intelligence–Based diagnosis aid for autism into the extension for community health outcomes autism primary care model: protocol for a prospective observational study. JMIR Res. Protoc.; 2022; 11, e37576. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35852831][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9346562]
30. Landsheer, JA. The clinical relevance of methods for handling inconclusive medical test results: quantification of uncertainty in medical decision-making and screening. Diagnostics; 2018; 8, 32. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29747402][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6023344]
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Rapidly rising demand for pediatric autism evaluations has outpaced specialist capacity and created a crisis of delayed diagnoses and treatment. Streamlining the diagnostic process could reduce wait times and optimize use of limited specialist resources. Following strong clinical trial results, Canvas Dx, an AI-based diagnostic, was FDA authorized to support accurate diagnosis or rule-out of autism in children 18–72 months with caregiver or healthcare provider concern for developmental delay. To gain insight into real-world device performance, a de-identified aggregate data analysis of the initial 254 Canvas Dx prescriptions fulfilled post-market authorization was conducted to determine: accuracy of autism predictions compared to clinical reference standard diagnosis and prior clinical trial data, key real-world prescriber and patient characteristics, proportion of determinate device outputs (positive or negative for autism) and impact of decision threshold settings on device performance. In this sample of 254 children with a 54.7% autism prevalence rate (29.1% female, average age 39.99 months), Canvas Dx had a NPV of 97.6% (CI- 92.8% -100.0%) and a PPV of 92.4% (CI-87.7%-97.2%). A majority of cases (63.0%) received a determinate result. Sensitivity and specificity of determinate results were 99.1% (CI-97.3%-100.0%) and 81.6% (CI-70.8%-92.5%) respectively. The median age of children who received a positive for autism output was 37.2 months, which is more than 2 years earlier than the current median age of autism diagnosis. No performance differences were noted based on patients’ sex. Compared to clinical trial results, real world performance was equivalent for all key metrics, with the exception of the determinate rate and the PPV which were significantly improved in real world performance. Analysis of real-world Canvas Dx data highlights its feasibility and utility in supporting accurate, equitable and early diagnosis or rule out of autism. With medical coverage and broader clinical adoption, innovative solutions such as Canvas Dx can play an important role in helping to address the growing specialist waitlist crisis, ensuring that more children gain access to targeted therapies during the critical window of neurodevelopment where they have the greatest life-changing impact.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 Cognoa Inc., 2185 Park Blvd, 94306, Palo Alto, CA, USA
2 Society of Developmental and Behavioral Pediatrics, Virginia, USA (ROR: https://ror.org/04j3rah08) (GRID: grid.475935.9) (ISNI: 0000 0001 2116 1936)
3 Cognoa Inc., 2185 Park Blvd, 94306, Palo Alto, CA, USA; Department of Biomedical Data Science, Department of Pediatrics, Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA (ROR: https://ror.org/00f54p054) (GRID: grid.168010.e) (ISNI: 0000 0004 1936 8956)