Introduction
The field of histopathology is digitalizing. Pathology images can be scanned to high-resolution images [whole slide images (WSIs)], which pathologists can assess on a computer screen, instead of using a traditional microscope [1–4]. With the digitalization of pathology, new opportunities to aid pathologists in their daily practice, such as artificial intelligence (AI), and more specifically deep-learning (DL) algorithms, emerge [5,6]. Such algorithms have already proven successful in detection and classification tasks, reaching accuracy levels that are similar to (subspecialized) pathologists [7,8]. In recent years, many AI models have been developed, mostly addressing major subareas in pathology, e.g. breast and prostate cancers [9,10]. However, AI algorithms also offer an interesting opportunity to improve diagnostics for rare diseases, which many pathologists may encounter only infrequently and for which (sub)specialized expertise might be lacking. One such example is the precursor lesion to high-grade serous ovarian carcinoma (HGSC), known as serous tubal intraepithelial carcinoma (STIC).
STIC is a noninvasive lesion, found in the fallopian tube, showing morphologic alterations similar to HGSC [11]. The clinical impact of a diagnosis of STIC is high, holding important prognostic information for individual patients and therapeutic consequences in studies on alternative risk-reducing strategies, where a diagnosis of isolated STIC could lead to an oophorectomy [12,13]. However, STIC can be a challenging diagnosis for pathologists and the interobserver agreement is moderate at best, when based on H&E-stained slides [14–16]. A diagnosis of STIC is therefore often supported by additional immunohistochemical stains for p53 and Ki-67. These may also help differentiate STIC from serous tubal intraepithelial lesion (STIL), a lesion that falls short of the criteria for STIC, but potentially also holds a link to HGSC [14,17,18]. Next to the classification, detecting STIC is likewise challenging. The incidence of isolated STIC (not in combination with concomitant HGSC) is low and ranges from 0.1% in populations with normal risk for ovarian carcinoma, up to 3% in risk-reducing salpingoophorectomy specimens from women at an increased risk of HGSC, such as BRCA1/2 pathogenic variant carriers [19,20]. This means that the average pathologist will have relatively little exposure to these lesions. Moreover, isolated cases of STIC and STIL are often very small and are easily overlooked [21]. A special grossing protocol, called the sectioning and extensively examining the fimbriated end (SEE-FIM) protocol was developed to maximize the likelihood of detection of STIC, but has resulted in a marked increase in the number of slides that a pathologist has to screen [22].
To assist pathologists in detecting potential STIC/STIL, we previously developed a DL algorithm, which was proven capable of accurately detecting regions of potential STIC/STIL in a fully automated fashion [23]. The aim of this study is to evaluate the impact of the use of this DL model on pathologists' performance in STIC detection, focusing on the effect on accuracy and slide review time of (gynecologic) pathologists and pathology residents.
Materials and methods
Case enrollment
One hundred specimens were selected from a previously established study cohort that was compiled at the time that the DL model was developed [23]. These specimens were previously used for validation and were not used for model training. The set contains 21 cases of STIC, 9 cases of STIL, and 70 controls. From each specimen, a single slide was selected. The cases originate from three different data sources, namely Radboud University Medical Center, the Netherlands (n = 7), the Dutch Nationwide Network of Histopathology and Cytopathology database (PALGA) (n = 14), and Johns Hopkins University, MD, USA (n = 9). Controls were also acquired from three different data sources: Radboud University Medical Center (Radboudumc), Canisius Wilhelmina Hospital Nijmegen (CWZ), and Rijnstate Hospital Arnhem, all in the Netherlands. This project was reviewed and approved by the research ethics committee at the Radboudumc (number: 2019-5879). Data transfer agreements were established with all involved parties, and ethical approval for the research was obtained in accordance with the local regulations (PALGA, ref. LZV_55-A1; Johns Hopkins University, USA, Ref TTO RUMC A22-0963; IRB: 00203161).
Reference standard
A reference standard for the previously established cohort, containing 249 slides diagnosed with STIC or STIL and 247 control slides had been set, including the 21 cases from Radboudumc and PALGA used in this study. Still images of regions of interest from these slides, identified by a pathology resident, under supervision of a gynecologic pathologist, were shown to an international panel of 15 experienced gynecologic pathologists, whereby each image was reviewed by 5 pathologists from this panel. These pathologists, blinded to the original diagnoses, classified the images into seven categories like ‘normal’, ‘STIC’, ‘STIL’, etc. As various diagnostic algorithms for STIC exist, no specific instructions regarding morphological criteria or immunohistochemistry (IHC) were given to this panel [14,17,18]. Whenever IHC for p53 and Ki-67 were available, these were provided to the panel, to assist in classifying the lesion. IHC was not further used in training the AI model. The expert panel reached a median kappa value of 0.53 when asked to subclassify the lesions. When only asked to make distinction between aberrant epithelium (STIC/STIL) versus normal epithelium, they reached a median kappa value of 0.86, corresponding to a strong level of agreement [23,24]. A definitive label of STIC/STIL was assigned when at least three pathologists agreed on the diagnosis. Based on these labels, annotations were made by the pathology resident on a cellular level, and checked by the gynecologic pathologist, who was not a member of the expert panel. The nine cases from Johns Hopkins University, which in the previous study were used in the external test set for the algorithm, as well as the 70 controls were reviewed at Radboudumc by a subspecialized gynecological pathologist (MS) and a pathology resident (JMAB). Pathologists involved in setting the reference standard were excluded from participating in this reader study. A more detailed description on setting the reference standard, including kappa values of the panel and visual examples of the annotations process are available in the article describing the models development and validation [23].
Readers
A total of 26 pathologists and pathology residents, hereafter referred to as ‘readers’ participated. None of them had been involved in setting the reference standard. The readers came from 11 countries, and self-reported to be working as pathology resident (7), (general) pathologist in a nonacademic hospital (2), gynecologic pathologist in a nonacademic hospital (1), (general) pathologist in an academic hospital (3), or gynecologic pathologist in an academic hospital (13). Their years of work experience (including residency years) ranged between 3 and 38, with a median of 10 years. Levels of experience with digital pathology varied from none (n = 3), in research setting only (14), to working in a laboratory with a partially (4) or fully (5) digitalized clinical workflow. An overview of the readers' background information is provided in supplementary material, Section S1.
DL algorithm
We previously developed a DL algorithm which is capable of automatically detecting regions of aberrant epithelium (STIC/STIL) in digitalized H&E-stained WSIs of fallopian tubes [23]. In brief, the model consists of a U-Net with a resnet50 backbone and was trained using 118 cases of STIC/STIL and 51 controls. The model was evaluated on two independent test sets, containing 327 and 186 slides respectively. The model displayed a robust performance, whereby receiver operating characteristic curve analysis achieved an area under the curve of 0.98 (95% CI: 0.96–0.99) on test set one, and 0.95 (95% CI: 0.90–0.99) on test set two.
Study design
This study was set up with a fully crossed multireader multicase design, as illustrated in Figure 1 [25]. All 26 readers took part in two sessions, reviewing the same 100 slides twice. In between these two sessions there was a minimum washout period of 4 weeks. The set of images was divided into blocks of 10 or 15 images. Each block contained a similar distribution of cases versus controls, to mitigate bias for potential differences in performance at the beginning versus the end of a session. Readers were randomly split into two groups beginning with either order 1 or order 2. In either order, the sequence of slides was identical. The sole difference between orders 1 and 2 was whether the blocks of 10–15 images were reviewed with or without AI assistance. Readers could work on the study at their own discretion and did not have to complete a session within a single sitting. In order to let readers familiarize themselves with reviewing digital WSIs in the viewer, two test slides were provided at the beginning of each session, which were not part of data analysis. At the end, all 26 readers had reviewed all 100 slides in two modalities, once with AI assistance (using a visual representation of the DL model output; assisted read), and once without AI assistance (unassisted read).
[IMAGE OMITTED. SEE PDF]
Research platform
The study was performed using the online CastorEDC study platform (). Readers received separate invitation links to the different sessions. Readers could access the WSI of H&E-stained slides by clicking a link in the study platform, which opened the slide in the Pathomation WSI viewer (), in a separate tab. If AI assistance was available, this was visualized as a black bounding box, surrounding the area(s) the algorithm had detected as aberrant (Figure 2). Slides could contain multiple bounding boxes. A number, indicating the number of bounding boxes, was displayed at the top of the digitalized image. If the algorithm had not detected any regions of potential aberrant epithelium, the number would be displayed as 0 (zero). After reviewing the WSI, readers were asked to indicate whether the slide contained aberrant epithelium (STIC/STIL), based on assessment of the H&E slide. In addition, readers indicated how certain they felt in their assessment on a 1–10 scale, with 1 indicating a low level of certainty and 10 indicating a high level of certainty. When readers reported that a WSI contained a (potential) STIC/STIL, they received a follow up question as to whether they would request additional IHC: often a pathologist will want to confirm the final diagnosis by IHC. Readers could specify if the request for IHC was for diagnostic confirmation of STIC, or for assistance in classifying the aberrant epithelium identified.
[IMAGE OMITTED. SEE PDF]
Image review timing
Readers were instructed to assess slides at a self-controlled pace. The review time for each slide was obtained by calculating the time difference between the moment that the WSI was opened in the viewer, to the moment that the first question was answered. This time difference was calculated by using the application programming interface of the CastorEDC research platform. Readers were instructed not to take breaks while reviewing a slide.
Statistical analysis
Statistical analyses for ‘accuracy’, ‘slide review time’, and ‘certainty’ were performed using mixed-effects models. All models were generated using the lme4 package in R [26,27]. In these models, readers and images were treated as random effects, whereby images were nested within readers. The assistance modality (with or without AI assistance) and the session (having started with order 1 or 2), were treated as fixed effects, with an interaction. To evaluate the readers ‘accuracy,’ a binominal generalized linear mixed model was generated, using the glmer function. The odds ratio was obtained by taking the exponent of the estimated effect. To evaluate the readers ‘slide review time’ and ‘level of certainty’ a linear mixed model was generated, using the lmer function. Fleiss' kappa values for the evaluation of interobserver agreement were calculated using IBM SPSS statistics version 27. The sensitivity and specificity of each reader was calculated for both the assisted and unassisted reads, for descriptive purposes.
Results
Performance and accuracy
An increase in sensitivity and specificity was observed when using AI assistance, whereby the median sensitivity for assisted reads was 93%, compared to 82% for the unassisted reads (Figure 3A). Also, the interquartile range reduced when AI assistance was used. This is also reflected in the increase in interobserver agreement, whereby the Fleiss' kappa (k) increased from (k) = 0.506 (95% CI: 0.495–0.517) for unassisted reads, corresponding with a moderate reproducibility, to (k) = 0.725 (95% CI: 0.714–0.736) for assisted reads, corresponding to good reproducibility [24]. Out of the 26 readers, 22 had an increased sensitivity with the assisted reads. The remaining four readers retained the same sensitivity (supplementary material, Section S2). Specificity increased from 92% to 97% as a result of AI assistance. The three cases that reached the greatest increase in accurate detection are displayed in Figure 4. A significantly higher likelihood to accurately detect STIC/STIL with the assisted reads compared to the unassisted reads was found, with an odds ratio of 6.49 (95% CI: 3.39–12.40) (p < 0.01).
[IMAGE OMITTED. SEE PDF]
[IMAGE OMITTED. SEE PDF]
Slide review time
The slide review time decreased significantly, with an estimated effect of 44 s less time spent in assisted reads compared to the unassisted reads (95% CI: 37–51) (p < 0.01). This corresponds to an average time reduction of 32% for the assisted reads. The time reduction was observed both for cases (reduction of 49 s; 95% CI: 37–61; p < 0.01) and for controls (reduction of 41 s; 95% CI: 33–49; p < 0.01). All 26 readers were faster with the assisted reads. Figure 3B visualizes the slide review times for assisted and unassisted reads, whereby the median review time decreased from 156 s for the unassisted reads to 114 s for the assisted reads. Their individual average slide review times are listed in supplementary material, Section S3.
Certainty score
A significant increase in readers' subjective certainty score was found, with an estimated effect of 0.25 increase for the assisted reads compared to the unassisted reads (95% CI: 0.13–0.36; p < 0.01). This effect was primarily observed among the controls (increase of 0.35; 95% CI: 0.22–0.46; p < 0.01) but was not significant in the cases (increase of 0.03; 95% CI: −0.22 to 0.28; p = 0.8). Out of the 26 readers, 21 had an increased sense of certainty with AI assistance. Figure 3C visualizes the individual readers' certainty scores for assisted and unassisted reads, whereby the median certainty score increased from 7.94 for unassisted reads to 8.60 for the assisted reads. The readers' individual average certainty scores are listed in supplementary material, Section S4.
IHC requests
In the unassisted reads, there were 775 instances where readers thought that STIC/STIL was present. Among these there were 683 requests for additional IHC. In the assisted reads, there were 791 potential STIC/STIL observed, resulting in 669 requests for IHC. An overview of the responses is depicted in Figure 3D.
Subgroup analysis
Subgroup analysis was based on work environment (residents; nongynecologic pathologists and gynecologic pathologists); years of work experience [≤10 years (median) versus >10 years] and level of experience with digital pathology [readers with experience limited to research settings versus readers working in a laboratory with an (at least) partially digitalized clinical workflow]. Analysis for accuracy using the generalized linear mixed-model was unsuccessful due to the complicated model in relation to the limited size of several subgroups, which led to overfitting. Analysis from the mixed models for ‘slide review time’ and ‘level of certainty’ was successful. However, due to the limited sample size within some of these subgroups, any conclusions drawn from these analyses should be interpreted with caution. The small sample size limits the reliability of the findings. These results, along with descriptive analysis of the sensitivity and specificity, are presented in supplementary material, Section S5.
Readers' experience
A brief questionnaire was held at the end of each session in which readers were asked about their experiences using AI assistance. The results show that, in general, most readers had a positive attitude toward the use of AI assistance and would be willing to integrate DL assistance for STIC/STIL detection in their clinical workflow (Figure 5).
[IMAGE OMITTED. SEE PDF]
Discussion
In this study, we assess the effect of AI assistance on pathologists' and pathology residents' performance when screening for STIC/STIL in H&E-stained slides of fallopian tube specimens. We found a significant increase in the accuracy of detecting STIC, combined with a significant decrease in slide review time and a rise in the level of certainty experienced.
A total of 22 out of the 26 readers achieved a higher sensitivity with AI assistance. Furthermore, the overall spread of the individual sensitivity and specificity scores decreased, as demonstrated by the reduced interquartile ranges when AI assistance was used. This increase in interobserver agreement is also reflected in the increase in kappa value (k), which went from (k) = 0.506 for the unassisted reads, corresponding to moderate reproducibility, to (k) = 0.725 for the assisted reads, corresponding to good reproducibility [24]. The increased accuracy under AI assistance appeared to persist in the subgroups, but these subgroups were too small to support the mixed-effects model. The potential for AI models to achieve high accuracy in specific pathology-related tasks has been previously reported, and multiple examples where DL models achieve high levels of accuracy, on par with pathologists, are available [7,9,10,28]. However, the ‘stand-alone’ performance of AI holds little value for assessing the practical diagnostic value of such a model, as such algorithms are not intended to replace pathologists [2,29]. It is the constructive interaction between pathologist and DL assistance which has exciting potential to further strengthen diagnostic accuracy, supporting the pathologists and enabling them to perform at the highest possible level. Studies such as the current one, in which a panel of pathologists using DL assistance is compared to a baseline, are a critical next step in translating research models to clinical use [7].
The increased sensitivity seems indicative of the AI highlighting regions to the readers that they may have overlooked in the unassisted setting. In many applications of the use of AI, such increased sensitivity comes at the cost of decreased specificity: a sensitive algorithm typically also produces false positives that may lead to cases being overcalled. Interestingly, in the current study, improved sensitivity was paired with an increase in specificity. Possibly, the quality of the AI was such that in a number of hard (false positive) regions where the pathologist was in doubt, AI tipped the decision in the correct direction.
In addition, the results showed a significant reduction in slide review time. The mixed models calculated AI assisted reads to be 44 s faster, consistent with a time reduction of 32%. This effect was visible in both cases and controls, where the estimated time reduction was 41 s. STIC is a relatively rare entity in isolation [19,20]. Thus, pathologists spend the majority of their time reviewing STIC/STIL negative slides. If a time reduction is to be clinically meaningful, this should be achieved in the negative slides. With this significant time reduction among controls, an even more rigorous assessment of fallopian tube specimens, i.e. by more widespread use of the SEE-FIM grossing protocol, or examination of tissue blocks on multiple levels, might become feasible in a study setting, to analyze its potential added value [22,30,31]. Though time reduction is welcomed, the potential risk for over-reliance on an algorithm has been debated [32]. Therefore, it is important for pathologists to acquire skills and knowledge about how AI support works. They must be able to grasp both the potential advantages and limitations of AI assistance, as well as have access to tools that aid in understanding or visualizing the AI model's predictions. One example of such a tool is the use of heatmaps to visualize AI model output, as shown in Figure 2. Building these skills is essential for pathologists to better comprehend the role of AI in their work and to ensure the safe and responsible integration of these technologies into daily practice. It is important to keep in mind that time reduction is not in itself the primary goal, and is only secondary to (improved) accuracy.
Finally, a modest increase in certainty was observed for the assisted reads. When analyzing cases and controls separately, we see that this increase in certainty mostly results from assessment of the controls (0.35 increase; p < 0.01) rather than from cases (0.03 increase; p = 0.8). Potentially, the idea of an additional check, making sure that a lesion has not been missed, can provide that additional sense of certainty. It must be noted that any study on the impact of AI on pathologists' diagnostics suffers from the fact that, today, AI models are hardly ever used in pathology, meaning that the study participants have no experience in the use of such diagnostic support. It is likely that once a pathologist has more experience with AI and better understands the strengths and weaknesses, the benefit of using AI will only increase.
This study has several limitations. The dataset was enriched with cases of STIC/STIL, as mimicking the normal incidence of isolated STIC would make a study onerous and infeasible. In addition, although we attempted to simulate a clinical diagnostic setting, a study setting can never replicate the context of the actual clinical workflow. Finally, despite having established a sizeable reader group, the small subgroup sizes limited reliable analysis by the linear mixed-effects model. Nonetheless, we believe that this study clearly demonstrates the potential that this DL model can have on screening for STIC/STIL.
Future work to strengthen the DL model would ideally encompass broad collaboration between institutions and experts. Additional cases of STIC/STIL to further deepen the expert-pathologist-driven reference standard and further train the DL model could eventually lead to a classification model, capable of assisting pathologists to differentiate between STIC and STIL. To further analyze the effect on readers, a prospective trial in a clinical setting would be optimal. With the ongoing digitalization of pathology, such an approach will become more feasible over time.
In conclusion, this study evaluated the impact of a DL model designed for the automated detection of STIC/STIL lesions in WSI of fallopian tube specimens. Across a diverse group of readers, from 11 countries, professional backgrounds and varying years of working experience, we observed a significant improvement in accuracy of STIC/STIL detection and a significant decrease in slide review time. Most readers reacted positively to working with the model and indicated that they would be interested in using the model in clinical practice. We believe this model can meaningfully assist pathologists in diagnosing STIC/STIL and can optimize the diagnostic process for these lesions.
Acknowledgements
The authors thank Dr Scott Maurits, from the Department of Health Evidence, Radboud University Medical Center, Nijmegen, the Netherlands, for his support on statistics. This study is supported by Dutch Cancer Society (KWF) (project number 12950).
Author contributions statement
JMAB, MPS, J-MB, MHDvB, MS, JAH and JAWML conceived and designed the study. All authors collected and assembled the data; interpreted and analyzed the data; wrote and had final approval of the manuscript; and are accountable for all aspects of the work.
Data availability statement
Images are subject to various data transfer agreements. These images can be requested at the respective pathology institutions. Source code to train and assess the DL model, and data about the reference standard, are available from the corresponding author on reasonable request. The DL model will be made freely accessible online ().
Appendix - AI-STIC Study Group (for group authorship)
Joost Bart, Department of Pathology and Medical Biology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.
Jessica L Bentz, Department of Pathology, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA.
Tjalling Bosse, Department of Pathology, Leiden University Medical Center, Leiden, The Netherlands.
Johan Bulten, Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands.
Mohamed Mokhtar Desouki, Department of Pathology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA; Jacobs School of Medicine and Biomedical Sciences, State University of New York at Buffalo, Buffalo, NY, USA.
Ricardo R Lastra, Department of Pathology, University of Chicago, Chicago, IL, USA.
Tricia A Numan, Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
J Kenneth Schoolmeester, Department of Laboratory Medicine and Pathology, Mayo Clinic, Jacksonville, FL, USA.
Lauren E Schwartz, University of Pennsylvania, Department of Pathology and Laboratory Medicine, Philadelphia, PA, USA.
Ie-Ming Shih, Department of Gynecology and Obstetrics and Pathology, Johns Hopkins Medical Institution, Baltimore, MD, USA.
T Rinda Soong, Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
Gulisa Turashvili, Department of Pathology and Laboratory Medicine, Emory University Hospital, Atlanta, GA, USA.
Russell Vang, Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
Mila Volchek, Department of Anatomical Pathology, Royal Children's Hospital, Parkville, Victoria, Australia.
Riena P Aliredjo, Department of Pathology, Rijnstate Hospital, Arnhem, The Netherlands.
Heidi Kusters-Vandevelde, Department of Pathology, Canisius Wilhelmina Hospital, Nijmegen, The Netherlands.
Abels E, Pantanowitz L, Aeffner F, et al. Computational pathology definitions, best practices, and recommendations for regulatory guidance: a white paper from the digital pathology association. J Pathol 2019; 249: 286–294.
Niazi MKK, Parwani AV, Gurcan MN. Digital pathology and artificial intelligence. Lancet Oncol 2019; 20: e253–e261.
Pallua JD, Brunner A, Zelger B, et al. The future of pathology is digital. Pathol Res Pract 2020; 216: [eLocator: 153040].
Jahn SW, Plass M, Moinfar F. Digital pathology: advantages, limitations and emerging perspectives. J Clin Med 2020; 9: 1–17.
Madabhushi A, Lee G. Image analysis and machine learning in digital pathology: challenges and opportunities. Med Image Anal 2016; 33: 170–175.
Srinidhi CL, Ciga O, Martel AL. Deep neural network models for computational histopathology: a survey. Med Image Anal 2021; 67: [eLocator: 101813].
van der Laak J, Litjens G, Ciompi F. Deep learning in histopathology: the path to the clinic. Nat Med 2021; 27: 775–784.
Echle A, Rindtorff NT, Brinker TJ, et al. Deep learning in cancer pathology: a new generation of clinical biomarkers. Br J Cancer 2021; 124: 686–696.
Bejnordi BE, Veta M, van Diest PJ, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 2017; 318: 2199–2210.
Bulten W, Pinckaers H, van Boven H, et al. Automated deep‐learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study. Lancet Oncol 2020; 21: 233–241.
Bogaerts JMA, van Bommel MHD, Hermens RPMG, et al. Consensus based recommendations for the diagnosis of serous tubal intraepithelial carcinoma: an international Delphi study. Histopathology 2023; 83: 67–79.
Steenbeek MP, van Bommel MHD, Bulten J, et al. Risk of peritoneal carcinomatosis after risk‐reducing salpingo‐oophorectomy: a systematic review and individual patient data meta‐analysis. J Clin Oncol 2022; 40: 1879–1891.
Steenbeek MP, van Bommel MHD, intHout J, et al. TUBectomy with delayed oophorectomy as an alternative to risk‐reducing salpingo‐oophorectomy in high‐risk women to assess the safety of prevention: the TUBA‐WISP II study protocol. Int J Gynecol Cancer 2023; 33: 982–987.
Visvanathan K, Vang R, Shaw P, et al. Diagnosis of serous tubal intraepithelial carcinoma based on morphologic and immunohistochemical features. Am J Surg Pathol 2011; 35: 1766–1775.
Vang R, Visvanathan K, Gross A, et al. Validation of an algorithm for the diagnosis of serous tubal intraepithelial carcinoma. Int J Gynecol Pathol 2012; 31: 243–253.
Carlson JW, Jarboe EA, Kindelberger D, et al. Serous tubal intraepithelial carcinoma: diagnostic reproducibility and its implications. Int J Gynecol Pathol 2010; 29: 310–314.
Meserve EEK, Brouwer J, Crum CP. Serous tubal intraepithelial neoplasia: the concept and its application. Mod Pathol 2017; 30: 710–721.
Perrone ME, Reder NP, Agoff SN, et al. An alternate diagnostic algorithm for the diagnosis of intraepithelial fallopian tube lesions. Int J Gynecol Pathol 2020; 39: 261–269.
Samimi G, Trabert B, Geczik AM, et al. Population frequency of serous tubal intraepithelial carcinoma (STIC) in clinical practice using SEE‐Fim protocol. JNCI Cancer Spectr 2018; 2: 61.
Bogaerts JMA, Steenbeek MP, van Bommel MHD, et al. Recommendations for diagnosing STIC: a systematic review and meta‐analysis. Virchows Arch 2022; 480: 725–737.
Shih IM, Wang Y, Wang TL. The origin of ovarian cancer species and precancerous landscape. Am J Pathol 2021; 191: 26–39.
Medeiros F, Muto MG, Lee Y, et al. The tubal fimbria is a preferred site for early adenocarcinoma in women with familial ovarian cancer syndrome. Am J Surg Pathol 2006; 30: 230–236.
Bogaerts JMA, Bokhorst J‐M, Simons M, et al. Deep learning detects premalignant lesions in the fallopian tube. npj Women's Health 2024; 2: 11.
Altman DG. Practical Statistics for Medical Research. Chapman & Hall: London, 1990.
Obuchowski NA, Bullen J. Multireader diagnostic accuracy imaging studies: fundamentals of design and analysis. Radiology 2022; 303: 26–34.
Bates D, Mächler M, Bolker B, et al. Fitting linear mixed‐effects models using lme4. J Stat Softw 2015; 67: 1–48.
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna, 2021. [Accessed September 2023]. Available from: https://www.R-project.org/.
Hekler A, Utikal JS, Enk AH, et al. Pathologist‐level classification of histopathological melanoma images with deep neural networks. Eur J Cancer 2019; 115: 79–83.
Topol EJ. High‐performance medicine: the convergence of human and artificial intelligence. Nat Med 2019; 25: 44–56.
Rabban JT, Krasik E, Chen LM, et al. Multistep level sections to detect occult fallopian tube carcinoma in risk‐reducing salpingo‐oophorectomies from women with BRCA mutations: implications for defining an optimal specimen dissection protocol. Am J Surg Pathol 2009; 33: 1878–1885.
Mahe E, Tang S, Deb P, et al. Do deeper sections increase the frequency of detection of serous tubal intraepithelial carcinoma (STIC) in the ‘sectioning and extensively examining the FIMbriated end’ (SEE‐FIM) protocol? Int J Gynecol Pathol 2013; 32: 353–357.
Cabitza F, Rasoini R, Gensini GF. Unintended consequences of machine learning in medicine. JAMA 2017; 318: 517–518.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
In recent years, it has become clear that artificial intelligence (AI) models can achieve high accuracy in specific pathology‐related tasks. An example is our deep‐learning model, designed to automatically detect serous tubal intraepithelial carcinoma (STIC), the precursor lesion to high‐grade serous ovarian carcinoma, found in the fallopian tube. However, the standalone performance of a model is insufficient to determine its value in the diagnostic setting. To evaluate the impact of the use of this model on pathologists' performance, we set up a fully crossed multireader, multicase study, in which 26 participants, from 11 countries, reviewed 100 digitalized H&E‐stained slides of fallopian tubes (30 cases/70 controls) with and without AI assistance, with a washout period between the sessions. We evaluated the effect of the deep‐learning model on accuracy, slide review time and (subjectively perceived) diagnostic certainty, using mixed‐models analysis. With AI assistance, we found a significant increase in accuracy (
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details




1 Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
2 Department of Obstetrics and Gynecology, Radboud University Medical Center, Nijmegen, The Netherlands
3 Diagnostic and Research Institute of Pathology, Medical University of Graz, Graz, Austria
4 Pathology Unit, Department of Woman and Child's Health and Public Health Sciences, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
5 Department of Pathology, LabPON, Hengelo, The Netherlands
6 Department of Pathology, Maria Sklodowska‐Curie National Research Institute of Oncology, Warsaw, Poland
7 Department of Pathology, Ghent University Hospital, Ghent, Belgium
8 Department of Pathology, Hospices Civils de Lyon, Lyon, France
9 Department of Pathology, University Hospital Virgen de la Arrixaca, Murcia, Spain
10 Institute for Pathology and Neuropathology, University of Tuebingen Medical Center II, Tuebingen, Germany
11 Department of Pathology and Laboratory Medicine, University of British Columbia and Vancouver General Hospital, Vancouver, Canada
12 General Pathology and Cytopathology Unit, Department of Medicine‐DMED, University of Padua, Padua, Italy
13 Department of Pathology, San Gerardo Hospital, Monza, Italy
14 Department of Pathology and Medical Biology, University Medical Center Groningen, Groningen, The Netherlands
15 Department of Pathology, and GROW School for Oncology and Reproduction, Maastricht University Medical Center+, Maastricht, The Netherlands
16 Institute for Pathology and Neuropathology, University Hospital Tübingen, Tübingen, Germany
17 Pathology Department, Ospedale dell'Angelo, Venezia‐Mestre, Italy
18 Department of Pathology, St. George's University Hospitals, London, UK
19 Department of Pathology, Yale School of Medicine and Yale School of Public Health, New Haven, CT, USA
20 Department of Pathology, University of California San Francisco, San Francisco, CA, USA
21 Department of Pathology, Erasmus University Medical Center, Rotterdam, The Netherlands
22 Department of Pathology, Cancer Research Institute Ghent (CRIG), Ghent University Hospital, Ghent, Belgium
23 Department of Pathology, Jeroen Bosch Hospital, 's‐Hertogenbosch, The Netherlands
24 Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands, Center for Medical Image Science and Visualization, Linköping University, Linköping, Sweden