Sophisticated artificial intelligence (AI)-based algorithms not only enable discrete layer segmentation of retinal optical coherence tomography (OCT) scans, but can identify novel retinal features. A prominent recent example is automatic clinic referral via deep learning based on retinal layer analysis.1 The current OSCAR-IB quality control (QC) criteria take algorithmic quality components into account (the A-criterion) but the criteria are not adapted to AI-based algorithms.2 Given the recent success and rapid developments in this field, it is timely to build on the OSCAR-IB QC criteria to address the challenges of AI and big data specifically.
To this purpose, it is critical to acknowledge that accuracy is paramount to the interpretation of retinal OCT in neurological disease. Judgments are highly dependent on quantitative data of individual retinal layers. Key components are thickness, degree of change, and alteration of the topography. The retinal layer thickness changes seen in neurological disorders are much more subtle3-7 than the pathologies seen in the ophthalmologic diseases, now successfully detected by AI-based methods.1,8,9 For neurodegenerative diseases, relevant annual retinal layer atrophy rates are just above the axial image resolution of contemporary spectral-domain and swept-source OCT techniques.5 For this reason, image QC is paramount. Over the past decade, big OCT data have accumulated in neurodegenerative and neuroinflammatory diseases. These data are attractive for the development of AI strategies. The expectation is to improve the accuracy of OCT-based quantification, diagnostic sensitivity and specificity, discover novel surrogates for monitoring disease progression as well as outcome metrics for clinical trials. Fully automated AI-based strategies are transferable from highly specialized services to primary care. The test throughput is also scalable to include, for example, high street opticians. Both will aid with the logistics of patient care through local centers.
In 2012, we proposed the first consensus OCT QC criteria, OSCAR-IB.2 The name served as a mnemonic for seven distinct QC criteria to be remembered; (i) Obvious errors, (ii) Signal strength, (iii) Centration of scan, (iv) Algorithm failure, (v) Retinal pathology, (vi) Illumination, and (vii) Beam placement. This was followed by international validation10 and endorsement to reporting guidelines.11 The OSCAR-IB QC criteria were developed in a multiple sclerosis (MS) network, and have since been broadly accepted. This success is at least in part due to the demand for clarity and transparency on being practical to “get simple things right”. A similar approach is warranted for the role of AI in relation to OCT data in neurodegenerative diseases.
Review of evidence for need of QCAI-based strategies are at risk for propagation of systematic errors, imbalance, and bias due to subtle differences in data acquisition or postprocessing. There is justified concern about the lack of QC standards for big data.12,13 The rise of big data is in part driven by the hope of improving P4 medicine: predictive, preventive, personalized, and participatory care.14 For example, integrative prediction models encompassing as many as 63 variables have been proposed to enable personalized predictions of individuals’ outcomes and guide treatment decisions in myeloproliferative neoplasms.15 The implications that AI-driven approaches will have for individuals can easily be influenced by bespoke sources of bias fed into the model as discussed below.
Importantly, minimal variation in image acquisition can cause substantial errors in the quantitative data.16,17 Strategies based on AI are excellent in recognizing changes between images, but do not necessarily know how the human OCT operators have acquired an image. This can mislead the AI-based strategy with a downstream effect of possible misdiagnosis, mismanagement, and harm. The risk for such a situation to occur increases with rapidly rising numbers of OCT scans to be evaluated. It may introduce systematic errors if imbalances exist between populations and centers, for example, due to service capacity issues or automation of our health-care systems. The possible medico-legal ramifications are also evident.
Review of failures and successes of retinal OCT in neurological diseasesTo date, OCT data in MS and related disorders are most consistent as most reports adhered to the OSCAR-IB QC criteria and followed the APOSTEL reporting guidelines.2,11 Results are more heterogeneous for other neurological diseases because of lack of standardization. There is evidence for an early publication bias in Alzheimer disease (AD) reports.6 Subsequent data were not supportive of the earlier enthusiasm. Few of the reports on AD followed a rigorous QC approach. This similarly applies to reports of OCT in Parkinson disease (PD),18,19 amyotrophic lateral sclerosis (ALS),20-23 stroke,24 epilepsy,4,25 and schizophrenia.26
Review of successes of AI in neurological diseasesThere are critical successes for the use of AI in neurological disease. For example, urgent triaging of individuals from brain imaging to neurosurgery27; earlier diagnosis of AD28; identifying suitable candidates for epilepsy surgery29; and regulation of adaptive deep brain stimulation in movement disorders.30 Imaging-based trial outcome measures in neurology include almost all neurodegenerative, neurovascular, and neuroinflammatory conditions alongside tumors.31 Imaging data have become multimodal. This adds to complexity and time needed by human readers and reporting. Likewise, histological data can now be used for machine and deep learning.32
The review committeeThe ophthalmological community has driven advances in AI-based analysis of retinal OCT.1 The committee for this review has been expanded (Table 1). We included representatives from neurological sub-specialties who have used retinal OCT data for diagnostic and prognostic purposes, as well as treatment trial outcome measures. We have also engaged with experts in the fields of AI and bio-engineering and bio-statistics and two not-for-profit organizations (
Table 1 Expertise of the literature review committees.
Expertise | Members |
Patient voice | Nils Wiegerink (patient), Russel Wheeler (patient advocate), Christiaan Waters (President of patient organization), Avril Daily (Retina International, ERN-EYE), Christina Fasser (Retina International, ERN-EYE), Orla Galvin Deborah (Retina International, ERN-EYE), and Oshakuade (Retina International, ERN-EYE) |
AI | Erik Bekkers, Siegfried Wagner, and Pearse Keane |
Public relation & Media | Avril Daily |
ALS | Philip Albrecht and Orhan Atkas |
Alzheimer disease | Thomas Wisnewski |
Epilepsy | Josemir W. Sander |
Parkinson disease | Alexander Brandt, Philipp Albrecht, and Orhan Atkas |
Stroke | Shadi Yaghi and Arvind CHANDRATHEVA |
Multiple Sclerosis | Alexander Brandt, Peter Calabresi, Laura Balcer, Elliot & Tree Frohman, Friedeman Paul, Ari Green, Pablo Villoslada, Axel Petzold, Philipp Albrecht, Orhan Aktas, E. Ann Yeh, Bernardo Sanchez-Dalmau, Jen Graves, Shiv Saidha, Robert Bermel, IMSVISUAL, and ERN-EYE |
Rare Diseases | Alexander Brandt, Philipp Albrecht, Orhan Atkas, Axel Petzold, Friedeman Paul, Frederike Oertel, Alexander Brandt, E. Ann Yeh, Avril Daily (Retina International, ERN-EYE), Christina Fasser (Retina International, ERN-EYE), Orla Galvin Deborah (Retina International, ERN-EYE) Oshakuade (Retina International, ERN-EYE), Bernardo Sanchez-Dalmau, and ERN-EYE |
Ophthalmology | Bernardo Sanchez-Dalmau, Pearse Keane, Siegfried Wagner and ERN-EYE |
Neuro-ophthalmology | Fiona Costello, Ari Green, Axel Petzold, Laura Balcer, Bernardo Sanchez-Dalmau, Jen Graves, and ERN-EYE |
OCT | Alexander Brandt, Frederike Oertel, Hannah Zimmerman, Philipp Albrecht, Orhan Atkas, Peter Calabresi, Axel Petzold, Jen Graves, Rachel Nolan-Kennedy, Laura Balcer, Shiv Saidha, Bernardo Sanchez-Dalmau, Pablo Villoslada, and Robert Bermel |
OCTA | Benjamin Knier, Shiv Saidha, Axel Petzold and IMSVISUAL |
Clinical trials OCT QC | Alexander Brandt, Friedeman Paul, Sven Schippling, Axel Petzold, Robert Bermel, Laura Balcer and IMSVISUAL |
Statistics and epidemiology | David Crabb, Gary Cutter, Laura Balcer, Jen Graves, Rachel Nolan-Kennedy, Kathryn Fitzgerald, and Zhaoxia Yu |
The importance of patient involvement as a key stakeholder has been recognized33 and contributed to the development of conceptual models.34 On a day-to-day practical level, the experience has demonstrated that individuals tolerate retinal OCT well. It is noninvasive, noncontact, quick, and provides instant feedback. The possibility to display images directly to the individuals and discuss changes has given them more confidence and insight in their care.35 This good partnership has helped in working together to build trust, supporting treatment decisions and making OCT scans available for research. There is a need to maintain this mutual trust at a time where the immense amount of data accumulated now permit AI-inspired projects on big data. None of this is possible without patient participation, their consent, and feedback. A key concern of patients and their advocates is that their data will be misused. Individuals have a higher level of confidence in not-for-profit stakeholders than in government or private companies.36
Data protection and privacyDue to the requirement of very large training datasets for optimal performance, most current clinical AI systems have been developed using routinely collected data which have been anonymized. Anonymization of medical images presents specific challenges, however, particularly images of individually unique structures such as the neurosensory retina.37 Even when carefully anonymized, there is at least a theoretical risk of re-identification for such images, either now or with some future technology.38 Therefore, we recommend a multistep approach to addressing data protection and privacy. Firstly, retinal OCT scans should be anonymized according to current national and international standards.39 This includes removal of any imaging meta-data such as patient names, dates of birth, or medical record numbers, obscuration of hospital visit dates, plus careful consideration of any associated clinical meta-data (e.g., merging of categories/classes if they contain only a limited number of examples).40 Secondly, a range of additional safeguards should be put in place. Technical safeguards include the requirement to store data in trusted research environments with access controls and audit logs; contractual safeguards include prohibitions against linkage or attempted re-identification of data. Importantly, every attempt should be made to minimize the data shared to that required for the clinical or research purpose – this is a fundamental principle of much data protection regulations, including the European General Data Protection Regulation (GDPR). Finally – and perhaps most importantly – it is vital to engage in patient and public engagement and involvement at the earliest possible stage. This includes making patients aware that their data are being used for research, publishing study protocols, and giving patients the opportunity to opt-out. By adopting a cautious and engaged approach such as this, we believe it is possible to reduce any data protection risks while maximizing the potential for future patient benefit. In the future, a range of technical solutions, including federated learning and homomorphic encryption, should help further mitigate these risks.41
Search strategy and selection criteriaWe reviewed three databases, PubMed, Web of Science, and Google Scholar, between 01 January 1963 and 23 April 2020 without language restriction. We chose the English version of a manuscripts if the same group had published similar data in Dutch, French, German, Italian, or Spanish. The search terms used were “optical coherence tomography” or “OCT” combined with “artificial intelligence”, “machine learning”, “deep learning”, “multiple sclerosis”, “optic neuritis”, “dementia”, “Alzheimer”, “Parkinson”, “motor neuron disease”, “amyotrophic lateral sclerosis”, “stroke”, “cerebrovascular accident”, “schizophrenia”, “patient voice”. We also reviewed articles included in three systematic reviews previously conducted.3,5,8
MethodsFirstly, we reviewed the original OSCAR-IB criteria to clarify which of the QC failures require an individual to be re-assessed or to be excluded if, for example, post hoc homogenization approaches fail. Having to recall a patient for a failed test is not desirable, is problematic, and is expensive. Secondly, we reviewed approaches to rectifying QC failures by image postprocessing. Thirdly, we examined the outcome of our AI-based methods for irregularities, identical to the approach taken in the original OSCAR-IB report.2 The terminology of terms explicitly related to AI is summarized in Table 2.
Table 2 Terminology and basic concepts.
Artificial Intelligence (AI) | Computer or machine-based intelligence which enables “learning” and “problem solving” |
Machine learning (ML) | One subset of AI. Typically algorithms improve automatically through experience after training on a dataset. ML can be supervised or unsupervised |
Deep learning | One subset of ML essentially based on artificial neuronal networks. Very efficient and the basis of most contemporary AI-based studies on image recognition |
Supervised | Supervised ML works on a labeled training dataset (e.g., OSCAR-IB OCT scans) and reproduces the desired outcome |
Unsupervised | Unsupervised ML tries to discover previously undetected patterns in a dataset |
Over-fitting | Over-fitting can be a problem with ML, a source of over-enthusiastic reporting and reason for lack of reproducibility |
Firstly, there is a clear and justified fear of the misuse of big data.37 Secondly, the patient–physician relationship must be supported to provide an optimal experience. Thirdly, demonstration of the capability of the AI strategy enhances the ability to produce high-quality and relevant effectiveness research. Fourthly, it promotes accountability. Fifthly, it provides grounding for the production of reproducible studies. Together, the definition of QC for AI can be summarized by five pillars which were named individually or in combination in the literature reviewed (Figure 1). The mnemonic, RASCO, stands for Reproducibility (R), Accountability for decisions made (A), to be Supportive of the patient–physician relationship (S), Capability ranging from machine learning (ML)-supported OCT quality control assessment to time and resource-efficient decision-making (C), and Openness with and trust in public opinion (O) is pertinent, given the personal data protection issues discussed above.42
FIGURE 1. The goal of quality control in Artificial Intelligence (AI) rests on five pillars: RASCO. (1) Openness with and trust in the public opinion, (2) to be Supportive for the patient–physician relationship, (3) Capability ranging from machine learning (ML)-supported OCT quality control assessment to time and resource-efficient decision-making, (4) Accountability for decisions made, and (5) Reproducibility (RASCO).
The utility of AI in medical applications is more dependent on data quality than quantity. The new research field of big data has contributed considerably to the advancement of medical science by analysis of large datasets. Until recently, it had not been easy to accumulate enough data to create a large data repository and analyses were too complicated or lacked statistical and computer power. A critical area of weakness of big data can be the granularity and quality of the source data entered. In essence, the quality of outputs or results of AI-based assessments should not be expected to exceed the underlying quality of the data being analyzed (input data). This underpins the importance of maintaining the highest standards of quality, even in the AI space. As we are at the dawn of AI for OCT research, one of our aims is to facilitate the generation of the high-quality data needed for future research in the field.
Prospective OCT image QCEach OCT scan should prospectively be labeled as QC fail or not. There are several reasons why a scan may fail QC. Each failed scan should be annotated with a complete list of reasons. An efficient way is to use the capital letters of the OSCAR-IB criteria.10 To avoid a potential bias by eliminating scans from the sickest individual who may have difficulties with the test, this needs to be explicitly noted. Retinal and systemic co-morbidities require careful clinical evaluation with more in-depth ophthalmic phenotyping than hitherto done in most neurological studies.43
QC failure may result in two situations: (1) where an error can be corrected at postprocessing; and (2) where the error requires recalling and repeating the test.
Human-led OCT image QC is a time-consuming task, so it is desirable for this to be performed using AI strategies. We suggest making use of the above-described annotation of failed scans for ML of OSCAR-IB criteria (Figure 2). This will enable the training of future AI algorithms to separate good from insufficient quality OCT scans. The next application step within the pool of scans designated as being of inadequate quality will be to identify those scans which may be subjected to post-acquisition correction approaches, thereby making them high quality, and enabling their safe and accurate utilization. This is a crucial step as it allows for AI training in auto-correction. Scans which failed OSCAR-IB and are not correctable must be excluded from any further AI steps.
FIGURE 2. The capability of AI to contribute interpreting OCT images depends on the optimization of each step contributing to the decision tree. The first step relates to the quality of the raw data. Validated QC criteria for OCT image have been summarized as OSCAR-IB.2 The ground truth of whether or not an OCT passes QC is based on human assessment. The seven OSCAR-IB criteria for QC rejection by a human assessor can directly be used to train AI. Annotation of corrupted OCT scans permits for two outcomes: (1) image postprocessing and repair of artifacts or (2) complete rejection and (if feasible) recall of patient and OCT rescan. Only a dataset that passed OCT image QC should be used for further AI interpretation.
Taken together this leaves a staged approach to QC in AI: (1) Automated AI OCT QC rating using validated OCT QC criteria2,10; (2) where possible AI QC correction during image postprocessing, or if not possible patient recall for repeat acquisition; and (3) final step of AI-based image analysis. This will typically make use of pattern recognition and be the key step forward for the primary research questions. A limitation to keep in mind is that presently it is not possible to exclude ophthalmological co-morbidities without a clinical assessment.
AI artifact vulnerabilityWe have not identified reports on the vulnerability of algorithms to misclassification due to the use of different OCT devices or software versions. Even seemingly small updates have the potential to cause significant differences which if left unnoticed can bias results.44
Ground truthThe definition of ground truth is disease-specific. It should be stated explicitly how the ground truth was defined. At the minimum for AD and other neurodegenerative dementias, epilepsy, MS, optic neuritis (ON), neuromyelitis optica spectrum disorder (NMOSD), PD, adherence to consensus investigation protocols, and diagnostic criteria will be required. As diagnostic criteria in most neurological diseases are regularly updated, this needs to be taken into account.
StatisticsThe descriptive statistics reviewed were mostly based on binary classifiers such as a disease is present “yes/no”. These models should include a comment on proportional bias.45 This is needed to interrogate how much the AI-based prediction agrees with the ground truth. The definition for an acceptable ground truth needs to include the level of evidence on which it was based. For binary and multiclassifier models, the degree of inter-rater agreement should be stated to permit judging on how stable the ground is.
Graphs can be presented in a way that allows judgment of the degree of over-fitting and underestimation relevant in comparing differences between AI and ground truth. Many studies used Bland-Altman plots46 or analyzed the performance of AI and ground truth based on a receiver operator curve (ROC)-based area under the curve (AUC). This gives comparative estimates of sensitivity, specificity, and the positive predictive value (PPV) as a measure of overall accuracy. This is particularly relevant for relatively rare diseases. It was recommended to complementing area under the curve ROC (AUROC) values by precision-recall (precision is the PPV and recall is the sensitivity in the AI literature) curves (AUPRC).8 This was found to be of relevance for unbalanced datasets (substantially more subjects in one of the groups compared).
Cut-off level calculationReporting of calculation of cut-off values included the use of independent cohorts, a graphical ROC-based approach, the Youden Index, k-fold cross-validation, or hold-out validation approaches to obtain accurate estimations of AI-based cut-off performance.
Power calculationsWe did not yet find consistent reports of the inclusion of power calculations to studies, which are relevant for randomized controlled trials using AI-based outcome measures.47 It is recommended that sample size estimates be performed before developing an algorithm and repeated after study completion. The gain in power, meaning a more robust statistical result, is just as informative for future research as the potential cost savings by optimizing numbers. Lastly, the standardized effect size, likely to come from AI, was recommended to be aligned with distribution, and anchor health economics to inform clinical trials on what will be a realistic difference.47,48
Cohort descriptionOn review, cohort descriptions were mostly conform to contemporary standards on demographic characteristics. Cohort descriptions are relevant for AI, and will also greatly limit/determine the usability of the system. This reinforces the need to build on successful initiatives such as the established Consolidated Standards of Reporting Trials of Electronic and Mobile Health Applications and online TeleHealth (CONSORT-EHEALTH)49 and the CONSORT-AI guidelines.50 Documentation of developmental changes is also relevant throughout pediatric care and at the transition to adult care. A novel source of potential biases related to the disease diagnostic criteria used. For many of the conditions of interest to retinal OCT, subsequent diagnostic criteria were published over the past decades. While generally aimed at improving practicability, sensitivity, and specificity, this bears the risk that cohorts which are supposed to have the same disease can be quite different in their composition. For example, subsequent diagnostic criteria for MS, AD, and PD have profoundly reshaped the patient base for clinical research over time. Contemporary cohorts tend to be milder than historical cohorts.51 Clinical trial populations are different from observational studies. The co-morbidity burden is relevant. Relevant items for the pooling of big data are: reporting of the exact diagnostic criteria, a detailed listing of all inclusion and exclusion criteria, recruitment, referrals, and capability of individuals to comply with the examinations. Minimization of the risk of systematic bias will ensure that validation of AI in other cohorts will be comparable.
ValidationFor all AI algorithm development efforts, data used for this purpose should be clearly described for discovery and replication analyses. To avoid obtaining a distorted/biased view on performance, data that are used for validation (e.g., to assess performance) should not have been used during algorithm development. There is a real risk for over-fitting the AI models. It was recommended to perform a validation of the algorithm with the aid of a comparable out-of-sample population. Each AI classification scheme should be rated to whether or not an external validation was performed. This can be supported by publishing details on the building blocks of the AI. Relevant are precise and meaningful definitions on a functional and performance level. This entails a detailed description of the AI architecture, hyper-parameters, as well as details on how the available data were used to train such systems, preferably via open access code repositories. One of the challenges with AI at regulatory level but also at the clinical level found was the fact that neural networks can learn with data and improve their performance. For this reason, it was suggested to define in advance which type of learning is allowed without requiring validation, approval, or lack clinical risks.
Human versus machine and human with machineIt was reported that AI might improve over human performance in terms of accuracy and speed.52 For this “machine versus human” approach reporting included data on sensitivity, specificity, positive, and negative predictive values including the 95% confidence intervals (CI) and the numbers on which the calculations were based. These data permit to answer the question if AI can outperform humans not only as seen with Chess and “Go” games,53,54 but also for classification of retinal appearances.55
There is a second, equally important question to be answered. How can AI be used to enhance human performance?56 Therefore, it was recommended to test if there is a synergistic effect if the AI and human approach are combined. This is typically referred to as human–AI symbiont/symbiotic.57
Clinical practiceThe relevance of potential clinical downstream effects has been recognized.58-60 The big chances are to reduce the burdens on physicians and help with service capacity issues. It was recommended to indicate if an algorithm is useful for clinical practice. This requires to test the algorithm in clinical routine. There were different levels at which algorithms added information: on an individual level, on a cohort level, or for screening purposes. There can be important consequences for daily clinical care and health systems. Concerns reported related to misdiagnosis and practicability. This has implications for disease classification.61
GuidelinesAt the time of the review, the following guidelines were relevant: APOSTEL, TRIPOD-AI, CONSORT-AI,50 SPIRIT-AI, and STROBE62 guidelines. They are regularly updated and latest information can be found on the equator network website at
Open access and data sharing were found to be essential for accountability and reproducibility. The classified sample dataset is just as valuable as the developed algorithm. Datasets can potentially be used by other groups, to facilitate even greater improvements. Accordingly, data availability may accelerate development in the field. Algorithms can also be transferable and codes can be shared.
Black boxOn review, there is a need to understand on what basis an algorithm came to a particular conclusion.
Review of the black box of OCT in neurologyThere are a few careful predictions one can make regarding the “black box” for neurodegeneration based on anatomy and progression pattern.
Firstly, anatomically each area in the retina is connected by axons with a corresponding area in the brain because of the hard-wired retino-cortical projections.3,5 The location of damage to the brain will determine the location of expected OCT changes in a determined area of the retina, a “Region of Interest” (ROI). It has been shown that an ROI-based approach to quantification of inner retinal layer atrophy is superior to occasionally performed sector analysis5-7 or the generally adopted global averaged approach because it can mask small areas of atrophy.63,64
Secondly, the progression pattern is determined by location and size of a lesion damaging the retino-cortical projections.64-66 The speed of progression is highest and the area of inner retinal atrophy most extensive with direct retrograde axonal degeneration as seen with optic nerve damage. More distal brain damage will still cause localized atrophy in the retina by a mechanism called retrograde trans-synaptic axonal degeneration.65,67 On sequential OCT imaging, the time course of atrophy is shorter with small brain lesions compared to larger brain lesions.64 It can be anticipated that a smoldering, slowly enlarging brain lesion will continue to drive the expansion of OCT detectable retinal atrophy.68
Thirdly, inflammatory activity in demyelinating disease has been related to transient increase of the inner nuclear layer (INL) volume.69-73 Part of this INL thickening is related to the development of microcystic macular edema (MME).69,70,74 Vitreous traction had been implicated, but is not required for the development of MMO.75 In most (>80%) cases, MMO is a transient phenomenon.74 In the remainder, it remains static over the years74,76 and is considered by some to represent a retrograde maculopathy77 due to axonotmesis in the anterior visual pathways as known from experimental models.78
Fourthly, there are qualitative observations on the OCT images, which have not yet been translated into automated forms of quantification. One example is the presence of hyper-reflective spots.74 There are two types of these hyper-reflective spots on OCT, and one is static and particularly visible at the upper and lower border of the INL. With the advent of OCT-Angiography (OCTA) and adaptive optics, it has become clear that they represent reflectivity changes from the inner retinal vasculature.79 There is at least another type of hyper-reflective spot noticed on serial OCT images, which migrates vertically through the retina.
Fifthly, the vitreous has specific OCT signal characteristics which can be reliably quantified from the raw image data.80,81 The technique is useful in neurological disease affecting younger adults where the vitreous body still adheres to the retina such as the majority of people with MS.82 The evaluation of the raw OCT data, rather than analysis of an already postprocessed screen image, is required due to signal changes.
Sixth, advanced image shape analyses now permit for quantitative data on qualitative characteristics of the optic disc. The technique has proved valuable in idiopathic intracranial hypertension83,84 and possibly also idiopathic moyamoya angiopathy.85 Similarly, the presence of peripapillary hyper-reflective ovoid mass-like structures (PHOMS) is a novel OCT finding,86 which akin to MMO remained undetectable on conventional funduscopic examination. Likewise, shape analysis of the fovea has become possible.87,88
Seventh, functional assessment of individual retinal layers by OCT is possible using, for example, a dark adaptation.79 One can anticipate that with the availability of OCTA, the retinal equivalent of a blood-oxygen-level-dependent (BOLD) signal for the brain will emerge.89 Increased, localized retinal metabolic activity will demand increased oxygen supply and cause elevated perfusion of the microvasculature.79 Pioneering data on OCTA in MS imply that there is a need for AI-supported QC to exclude artifacts.90-92 This will be relevant for reliable quantitative OCTA data on the retinal microvasculature which may help to differentiate between disease entities such as MS and NMOSD.93
Eighth, inter-eye differences of individual retinal layers are an attractive and highly sensitive method to screen for optic neuritis and MS.43,94-101 Expanding on these findings, there is a field for AI-based analyses of patterns of retinal asymmetry in MS.43
Lastly, reflectivity changes of individual layers can be interrogated to estimate tissue properties indirectly.102,103
Based on the above combination of numerous quantitative and qualitative changes in retinal (neural and non-neural tissue) architecture in neurological disease, there are promising avenues for a supervised ML approach to the analysis and interpretation of OCT data. Equally, for researchers who prefer to follow a nonsupervised ML approach, the committee recommends checking if findings may be explainable, at least in part, by the above summary of anatomically, biologically, and pathologically plausible observations.
SummaryIn summary, we reviewed several levels of AI-based OCT research in neurology. The main points arising from this review are summarized in Table 3 and based on five pillars (RASCO). The practical conclusions from the multiple levels of evidence reviewed and the summary table may be found helpful on a practical level for future research in the field.
Table 3 Summary of key points from the literature review on OCT and AI research in neurology. The categories are based on the mnemonic “RASCO”. This table may be found helpful in guiding future use of the reported data for AI-based studies.
Question | Answer |
REPRODUCIBLITY | |
OSCAR-IB OCT quality control compliant? | Yes / No |
APOSTEL OCT reporting guideline compliant? | Yes / No |
TRIPOD-AI compliant? | Yes / No |
CONSORT-AI compliant? | Yes / No |
SPIRIT-AI compliant? | Yes / No |
STROBE compliant? | Yes / No |
ACCOUNTABILITY | |
Training, test & validation sets explained? | Yes / No |
Potential for bias1 in big data addressed? | Yes / No |
Ground truth explicitly stated? | Yes / No |
Statement on proportional bias given? | Yes / No |
Precision-recall curves provided? | Yes / No |
Power calculations included? | Yes / No |
SUPPORTS PATIENT DOCTOR RELATIONSHIP | |
Patient voice included? | Yes / No |
Conflicts of interest, including political, explained? | Yes / No |
Shows how AI is used to enhance human performance? | Yes / No |
Tested in clinical practice? | Yes / No |
CAPABILITY OF ALGORITHM | |
Unsupervised AI?2 | Yes / No |
Has QC capabilities?3 | Yes / No |
Provides a glimpse into the black box?4 | Yes / No |
Vulnerabilities of AI explained?5 | Yes / No |
External Validation? | Yes / No |
OPENNESS | |
Data availability statement? | Yes / No |
Data deposited in repository? | Yes / No |
AI deposited in open access code repository? | Yes / No |
1Sources of bias can be analytical, clinical, statistical, imbalance in populations, or centres where the original research was conducted.
2SeeTable 2
3SeeFigure 2
4See Figure 3
5Vulnerabilities to artifacts, use of different devices, hard- or software updates of the OCT device.
AcknowledgmentsWe (AP and PAK) acknowledge a proportion of our financial support from the National Institute for Health Research (NIHR) Biomedical Research Centre based at Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology. JWS is based at the NIHR University College London Hospitals Biomedical Research Centre, which receives a proportion of funding from the UK Department of Health’s Biomedical Research Centres’ funding scheme. He receives support from the UK Epilepsy Society, the Dr. Marvin Weil Epilepsy Research Fund and the Christelijke Vereniging voor de Verpleging van Lijders aan Epilepsie, Netherlands. JNB is supported by the Dutch MS Research Foundation, grant nr. 18-1027. SW is supported by the Medical Research Council through a Clinical Research Training Fellowship.
We are grateful to all consortium members, who have contributed on many levels to the conception and development of this manuscript over the course over the past years.
Members of the IMSVISUAL and ERN-EYE Consortium include: Orhan Aktas, Jack Antel, Nasrin Asgari, Isabelle Audo, Jagannadha Avasarala, Daly Avril, Francesca R. Bagnato, Brenda Banwell, Amit Bar-Or, Raed Behbehani, Arnaldo Belzunce Manterola, Jeffrey Bennett, Leslie Benson, Jacqueline Bernard, Dominique Bremond-Gignac, Josefine Britze, Jodie Burton, Jonathan Calkwood, William Carroll, Arvind Chandratheva, Jeffrey Cohen, Giancarlo Comi, Christian Cordano, Silvana Costa, Fiona Costello, Ardith Courtney, Anes Cruz-Herranz, Gary Cutter, David Crabb, Lindsey Delott, Jerome De Seze, Ricarda Diem, Helene Dollfuss, Nabil K. El Ayoubi, Christina Fasser, Carsten Finke, Dominik Fischer, Kathryn Fitzgerald, Pedro Fonseca, Jette L. Frederiksen, Elliot Frohman, Teresa Frohman, Kazuo Fujihara, Iñigo Gabilondo Cuellar, Steven Galetta, Elena Garcia-Martin, Gavin Giovannoni, Brigita Glebauskiene, Inés González Suárez, Gorm Pihl Jensen, Steffen Hamann, Hans-Peter Hartung, Joachim Havla, Bernhard Hemmer, Su-Chun Huang, Jaime Imitola, Vytautas Jasinskas, Hong Jiang, Rahele Kafieh, Ludwig Kappos, Randy Kardon, David Keegan, Eric Kildebeck, Ungsoo Samuel Kim, Sasha Klistorner, Benjamin Knier, Scott Kolbe, Thomas Korn, Lauren Krupp, Wolf Lagrèze, Letizia Leocani, Netta Levin, Petra Liskova, Jana Lizrova Preiningerova, Birgit Lorenz, Eugene May, David Miller, Janine Mikolajczak, Saddek Mohand Saïd, Xavier Montalban, Mark Morrow, Ellen Mowry, Joaquim Murta, Carlos Navas, Rachel Nolan, Katarzyna Nowomiejska, Frederike Cosima Oertel, Jiwon Oh, Celia Oreja-Guevara, Christophe Orssaud, Benjamin Osborne, Olivier Outteryck, Catarina Paiva, Jacky Palace, Athina Papadopoulou, Nikos Patsopoulos, Jana Lizrova Preiningerova, Nikolas Pontikos, Markus Preising, Jerry Prince, Daniel Reich, Robert Rejdak, Marius Ringelstein, Luis Rodriguez de Antonio, Jose-Alain Sahel, Bernardo Sanchez-Dalmau, Jaume Sastre-Garriga, Sven Schippling, Joel Schuman, Ken Shindler, Robert Shin, Neil Shuey, Kerstin Soelberg, Svenja Specovius, Agnese Suppiej, Alan Thompson, Ahmed Toosy, Rubén Torres, Valérie Touitou, Susanne Trauzettel-Klosinski, Anneke van der Walt, Patrick Vermersch, Angela Vidal-Jordana, Amy T. Waldman, Christian Waters, Russell Wheeler, Owen White, Helmut Wilhelm, Kimberly M. Winges, Nils Wiegerinck, Lenja Wiehe, Thomas Wisnewski, Sui Wong, Jens Würfel, Shadi Yaghi, Yuyi You, Zhaoxia Yu, Patrick Yu-Wai-Man, Reda Žemaitien≐, and Hanna Zimmermann. More details about these collaborators are provided in Text S1.
Author ContributionsAxel Petzold: Study concept and design, project supervision, monthly project meetings, review of literature, drafting of manuscript, and revisions of manuscript. Phillip Albrecht: Monthly project meetings, review of literature, intellectual content, and revisions of manuscript. Laura Balcer: Monthly project meetings, review of literature, intellectual content, and revisions of manuscript. Erik Bekkers: Review of literature, intellectual content, and revisions of manuscript. Alexander U. Brandt: Monthly project meetings, review of literature, intellectual content, and revisions of manuscript. Peter A. Calabresi: Monthly project meetings, review of literature, intellectual content, and revisions of manuscript. Orla Galvin Deborah: Patient perspective, revision of content, and final version of the manuscript. Jennifer S. Graves: Monthly project meetings, review of literature, intellectual content, and revisions of manuscript. Ari Green: Critical revision of the manuscript for important intellectual content. Pearse A. Keane: Critical revision of the manuscript for important intellectual content. Jenny A. Nij Bijvank: Critical revision of the manuscript for important intellectual content. Friedemann Paul: Monthly project meetings, review of literature, intellectual content, and revisions of manuscript. Shiv Saidha: Monthly project meetings, review of literature, intellectual content, and revisions of manuscript. Josemir W. Sander: Critical revision of the manuscript for important intellectual content. Pablo Villoslada: Monthly project meetings, review of literature, intellectual content, and revisions of manuscript. Siegfried K. Wagner: Critical revision of the manuscript for important intellectual content. E. Ann Yeh: Monthly project meetings, review of literature, intellectual content, and revisions of manuscript.
Conflict of InterestA. Petzold is part of the steering committee of the ANGI network which is sponsored by ZEISS, steering committee of the OCTiMS study which is sponsored by Novartis, and reports speaker fees from Heidelberg Engineering. P. Albrecht reports consulting fees, research grants, and nonfinancial support from Allergan, Biogen, Celgene, Ipsen, Merck Serono, Merz Pharmaceuticals, Novartis, and Roche, consulting fees, and nonfinancial support from Bayer Healthcare, and Sanofi-Aventis/Genzyme, outside the submitted work. L. Balcer reports personal fees from Biogen; she is editor in chief of the Journal of Neuro-Ophthalmology. E. Bekkers has nothing to disclose. A. Brandt is cofounder and shareholder of startups Motognosis and Nocturne. He is named as inventor on several patent applications description MS serum biomarkers, perceptive visual computing, and retinal image analysis. R. Bremel has served as a consultant for Biogen, EMD Serono, Genzyme/Sanofi, Genentech/Roche, Novartis, and Viela Bio. He receives ongoing research support directed to his institution from Biogen, Genentech, and Novartis. P.A. Calabresi has received consulting fees for serving on scientific advisory boards for Biogen and Disarm Therapeutics, and is PI on grants to Johns Hopkins from Biogen, Gentech, and Annexon. O. Galvin has nothing to disclose. J.S. Graves has grant/contract research support from the National MS Society, Biogen, and Octave Biosciences. She serves on a steering committee for a trial supported by Novartis. She has received honoraria for a nonpromotional, educational activity for Sanofi-Genzyme. She has received speaker fees from Alexion and BMS and served on an advisory board for Genentech. A. Green reports grants and other support from Inception Biosciences; grants from the National Multiple Sclerosis Society and from the US. National Institutes of Health; additional support from MedImmune, Mylan, Sandoz, Dr Reddy, Amneal, Momenta, Synthon, and JAMA Neurology, outside the submitted work; and that the Multiple Sclerosis Center, Department of Neurology, University of California San Francisco has received grant support from Novartis for participating in the OCTIMS study. P.A. Keane is supported by a Clinician Scientist award (CS-2014-14-023) from the National Institute for Health Research. J. Nij Bijvank has nothing to disclose. J.W. Sander has been consulted by and received research grants and fees for lectures from Eisai, UCB, Zogenix, and GW Pharmaceuticals, outside the submitted work. F. Paul receives funding from Deutsche Forschungsgemeinschaft, Bundesministerium für Bildung und Forschung, and Guthy Jackson Charitable Foundation. FC has received consulting fees from Clene, EMD Serono, and PRIME, and is participating as a site investigator in the Novartis-funded OCTIMS study. S. Saidha has received consulting fees from Medical Logix for the development of CME programs in neurology and has served on scientific advisory boards for Biogen-Idec, Genzyme, Genentech Corporation, EMD Serono, and Celgene. He was the site investigator of a trial sponsored by MedDay Pharmaceuticals, and is the PI of investigator-initiated studies funded by Genentech Corporation and Biogen Idec, and received support from the Race to Erase MS foundation. He has received equity compensation for consulting from JuneBrain LLC, a retinal imaging device developer. P. Villoslada has received an honorarium from Heidelberg Engineering in 2014, has received unrestricted research grants from Novartis (including for the OCTIMS study), Biogen, Genzyme, and Roche, and has participated in advisory boards for Novartis, Roche, Genzyme, and Biogen. PVi holds stocks in the following spin-off companies: Bionure Inc, Spire Bioventures, Mintlabs, and Health Engineering. S. Wagner has nothing to disclose. E. Ann Yeh has received research funds from NMSS, CIHI, CIHR, NIH, OIRM, MS Society of Canada, Mario Battaglia Foundation, SickKids Foundation, CBMH Innovation Fund, CMSC, Stem Cell Network, Department of Defense, Rare Diseases Foundation, and Biogen. Unrestricted educational funds from Teva and Guthy-Jackson Foundation. She has served on a scientific advisory panel for Hoffmann-La Roche and Biogen and has received speaker’s honoraria from Novartis, CMSC, MS at the Limits, and Canadian Rheumatological Association.
Funding informationNone.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2021. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Artificial intelligence (AI)-based diagnostic algorithms have achieved ambitious aims through automated image pattern recognition. For neurological disorders, this includes neurodegeneration and inflammation. Scalable imaging technology for big data in neurology is optical coherence tomography (OCT). We highlight that OCT changes observed in the retina, as a window to the brain, are small, requiring rigorous quality control pipelines. There are existing tools for this purpose. Firstly, there are human-led validated consensus quality control criteria (OSCAR-IB) for OCT. Secondly, these criteria are embedded into OCT reporting guidelines (APOSTEL). The use of the described annotation of failed OCT scans advances machine learning. This is illustrated through the present review of the advantages and disadvantages of AI-based applications to OCT data. The neurological conditions reviewed here for the use of big data include Alzheimer disease, stroke, multiple sclerosis (MS), Parkinson disease, and epilepsy. It is noted that while big data is relevant for AI, ownership is complex. For this reason, we also reached out to involve representatives from patient organizations and the public domain in addition to clinical and research centers. The evidence reviewed can be grouped in a five-point expansion of the OSCAR-IB criteria to embrace AI (OSCAR-AI). The review concludes by specific recommendations on how this can be achieved practically and in compliance with existing guidelines.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details







1 Moorfields Eye Hospital, City Road, The National Hospital for Neurology and Neurosurgery, Queen Square, UCL Queen Square Institute of Neurology, London, UK; Neuro-ophthalmology Expert Center, Amsterdam UMC, The Netherlands
2 Department of Neurology, Medical Faculty, Heinrich-Heine University, Düsseldorf, Germany
3 Departments of Neurology, Population Health and Ophthalmology, NYU Grossman School of Medicine, New York, USA
4 AMLAB, Amsterdam, The Netherlands
5 Department of Neurology, University of California, Irvine, California, USA
6 Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
7 Retina International, Dublin, Ireland
8 Department of Neurosciences, UC, San Diego, California, USA
9 Department of Neurology, University of California San Francisco, San Francisco, California, USA
10 Moorfields Eye Hospital, City Road, The National Hospital for Neurology and Neurosurgery, Queen Square, UCL Queen Square Institute of Neurology, London, UK
11 Neuro-ophthalmology Expert Center, Amsterdam UMC, The Netherlands
12 NIHR UCL Hospitals Biomedical Research Centre, UCL Queen Square Institute of Neurology, London, UK; Chalfont Centre for Epilepsy, Chalfont St Peter, UK; Stichting Epilepsie Instellingen Nederland (SEIN), Heemstede, The Netherlands
13 Experimental and Clinical Research Center, Max Delbrück Center for Molecular Medicine and Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Germany
14 Institut d’Investigacion Biomediques August Pi Sunyer (DIBAPS) and Hospital Clinic, University of Barcelona, Barcelona, Spain
15 Division of Neurology, Department of Pediatrics, Hospital for Sick Children, Division of Neurosciences and Mental Health SickKids Research Institute, University of Toronto, Canada