Full text

Turn on search term navigation

This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication: https://creativecommons.org/publicdomain/zero/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

A number of challenges hinder artificial intelligence (AI) models from effective clinical translation. Foremost among these challenges is the lack of generalizability, which is defined as the ability of a model to perform well on datasets that have different characteristics from the training data. We recently investigated the development of an AI pipeline on digital images of the cervix, utilizing a multi-heterogeneous dataset of 9,462 women (17,013 images) and a multi-stage model selection and optimization approach, to generate a diagnostic classifier able to classify images of the cervix into “normal”, “indeterminate” and “precancer/cancer” (denoted as “precancer+”) categories. In this work, we investigate the performance of this multiclass classifier on external data not utilized in training and internal validation, to assess the generalizability of the classifier when moving to new settings. We assessed both the classification performance and repeatability of our classifier model across the two axes of heterogeneity present in our dataset: image capture device and geography, utilizing both out-of-the-box inference and retraining with external data. Our results demonstrate that device-level heterogeneity affects our model performance more than geography-level heterogeneity. Classification performance of our model is strong on images from a new geography without retraining, while incremental retraining with inclusion of images from a new device progressively improves classification performance on that device up to a point of saturation. Repeatability of our model is relatively unaffected by data heterogeneity and remains strong throughout. Our work supports the need for optimized retraining approaches that address data heterogeneity (e.g., when moving to a new device) to facilitate effective use of AI models in new settings.

Details

Title
Assessing generalizability of an AI-based visual test for cervical cancer screening
Author
Ahmed, Syed Rakin  VIAFID ORCID Logo  ; Egemen, Didem; Befano, Brian; Rodriguez, Ana Cecilia; Jeronimo, Jose; Desai, Kanan  VIAFID ORCID Logo  ; Teran, Carolina  VIAFID ORCID Logo  ; Alfaro, Karla; Fokom-Domgue, Joel  VIAFID ORCID Logo  ; Charoenkwan, Kittipat; Chemtai Mungo  VIAFID ORCID Logo  ; Luckett, Rebecca; Rakiya Saidu; Raiol, Taina; Ribeiro, Ana; Gage, Julia C  VIAFID ORCID Logo  ; de Sanjose, Silvia; Kalpathy-Cramer, Jayashree; Schiffman, Mark
First page
e0000364
Section
Research Article
Publication year
2024
Publication date
Oct 2024
Publisher
Public Library of Science
e-ISSN
27673170
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3251383630
Copyright
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication: https://creativecommons.org/publicdomain/zero/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.