Inflation of test accuracy due to data leakage in

Abstract

In the application of deep learning on optical coherence tomography (OCT) data, it is common to train classification networks using 2D images originating from volumetric data. Given the micrometer resolution of OCT systems, consecutive images are often very similar in both visible structures and noise. Thus, an inappropriate data split can result in overlap between the training and testing sets, with a large portion of the literature overlooking this aspect. In this study, the effect of improper dataset splitting on model evaluation is demonstrated for three classification tasks using three OCT open-access datasets extensively used, Kermany’s and Srinivasan’s ophthalmology datasets, and AIIMS breast tissue dataset. Results show that the classification performance is inflated by 0.07 up to 0.43 in terms of Matthews Correlation Coefficient (accuracy: 5% to 30%) for models tested on datasets with improper splitting, highlighting the considerable effect of dataset handling on model evaluation. This study intends to raise awareness on the importance of dataset splitting given the increased research interest in implementing deep learning on OCT data.

Details

Title

Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images

Author

Tampu, Iulian Emil¹

; Eklund, Anders²

; Haj-Hosseini, Neda¹

¹ Linköping University, Department of Biomedical Engineering, Linköping, Sweden (GRID:grid.5640.7) (ISNI:0000 0001 2162 9922); Linköping University, Center for Medical Image Science and Visualization, Linköping, Sweden (GRID:grid.5640.7) (ISNI:0000 0001 2162 9922)
² Linköping University, Department of Biomedical Engineering, Linköping, Sweden (GRID:grid.5640.7) (ISNI:0000 0001 2162 9922); Linköping University, Center for Medical Image Science and Visualization, Linköping, Sweden (GRID:grid.5640.7) (ISNI:0000 0001 2162 9922); Linköping University, Division of Statistics & Machine Learning, Department of Computer and Information Science, Linköping, Sweden (GRID:grid.5640.7) (ISNI:0000 0001 2162 9922)

Publication year

2022

Publication date

2022

Publisher

Nature Publishing Group

e-ISSN

20524463

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1038/s41597-022-01618-6

ProQuest document ID

2716809434

© The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images

Jump to:

Abstract

Details

Suggested sources