Content area
Histopathology image evaluation is indispensable for cancer diagnoses and subtype classification. Standard artificial intelligence methods for histopathology image analyses have focused on optimizing specialized models for each diagnostic task1,2. Although such methods have achieved some success, they often have limited generalizability to images generated by different digitization protocols or samples collected from different populations3. Here, to address this challenge, we devised the Clinical Histopathology Imaging Evaluation Foundation (CHIEF) model, a general-purpose weakly supervised machine learning framework to extract pathology imaging features for systematic cancer evaluation. CHIEF leverages two complementary pretraining methods to extract diverse pathology representations: unsupervised pretraining for tile-level feature identification and weakly supervised pretraining for whole-slide pattern recognition. We developed CHIEF using 60,530 whole-slide images spanning 19 anatomical sites. Through pretraining on 44 terabytes of high-resolution pathology imaging datasets, CHIEF extracted microscopic representations useful for cancer cell detection, tumour origin identification, molecular profile characterization and prognostic prediction. We successfully validated CHIEF using 19,491 whole-slide images from 32 independent slide sets collected from 24 hospitals and cohorts internationally. Overall, CHIEF outperformed the state-of-the-art deep learning methods by up to 36.1%, showing its ability to address domain shifts observed in samples from diverse populations and processed by different slide preparation methods. CHIEF provides a generalizable foundation for efficient digital pathology evaluation for patients with cancer.
Histopathology image evaluation is indispensable for cancer diagnoses and subtype classification. Standard artificial intelligence methods for histopathology image analyses have focused on optimizing specialized models for each diagnostic task1,2. Although such methods have achieved some success, they often have limited generalizability to images generated by different digitization protocols or samples collected from different populations3. Here, to address this challenge, we devised the Clinical Histopathology Imaging Evaluation Foundation (CHIEF) model, a general-purpose weakly supervised machine learning framework to extract pathology imaging features for systematic cancer evaluation. CHIEF leverages two complementary pretraining methods to extract diverse pathology representations: unsupervised pretraining for tile-level feature identification and weakly supervised pretraining for whole-slide pattern recognition. We developed CHIEF using 60,530 whole-slide images spanning 19 anatomical sites. Through pretraining on 44 terabytes of high-resolution pathology imaging datasets, CHIEF extracted microscopic representations useful for cancer cell detection, tumour origin identification, molecular profile characterization and prognostic prediction. We successfully validated CHIEF using 19,491 whole-slide images from 32 independent slide sets collected from 24 hospitals and cohorts internationally. Overall, CHIEF outperformed the state-of-the-art deep learning methods by up to 36.1%, showing its ability to address domain shifts observed in samples from diverse populations and processed by different slide preparation methods. CHIEF provides a generalizable foundation for efficient digital pathology evaluation for patients with cancer.
Histopathology image evaluation is integral to the diagnosis of cancers and cancer subtype classification. Previous studies on artificial intelligence (AI)-based histopathology image analysis primarily rely on training task-specific models optimized for each use case1,2. For example, specialized deep neural networks have been developed for cancer cell identification4,5, histological and molecular subtype classification6-10, prognosis evaluation11-14 and treatment response prediction using gigapixel whole-slide images (WSIs)15-17. Moreover, state-of-the-art computational pathology analyses have revealed quantitative morphological signals indicative of clinically important molecular markers18,19, demonstrating the potential of AI methods in identifying cellular features imperceptible to the human eyes20. Although these advances offer promising avenues for improving cancer evaluation, several limitations continue to plague quantitative pathology image analyses. To begin with, standard deep learning methods require a large amount of data to train a performing model for each task. As it is difficult to obtain comprehensive pathology representations that cover the heterogeneity of diverse tissue microenvironments, existing approaches mainly focus on solving each narrow diagnostic task individually1,7. In addition, most AI models for pathology imaging analyses are tailored from general computer vision models designed for classifying macroscopic objects (for example, animals, cars and buses)2. These conventional approaches do not leverage the general tissue pathology patterns when training specialized diagnostic models. Furthermore, AI models trained by images from a single source tend to overfit the training data distribution and suffer from substantial performance deterioration when applied to images processed by different pathology laboratories3,21. These limitations have hindered the effective application of state-of-the-art AI models for reliable pathology evaluation.
Self-supervised learning has emerged as a promising approach for obtaining robust image feature representation useful for a wide range of prediction tasks using samples collected in diverse settings22,23. As diverse unlabelled training data are relatively straightforward to collect and the model training process is task-agnostic, self-supervised learning has achieved robust performance across different tasks and data distributions, such as image retrieval24-26 and weakly supervised WSI analysis27. Recent advancements in self-supervised learning for pathology image analyses further utilized both images and their text descriptions to augment the performance of computer vision models28,29. However, these methods have two major limitations. First, they primarily focus on individual image tiles in the WSIs, without considering the interactions of different regions of the same tissue. Second, previous studies focused on narrow diagnostic tasks and did not evaluate the generalizability of the extracted quantitative imaging features in different prediction tasks across cancer types and samples from several sources. As pathologists often face a variety of disease samples and need to assimilate contextual information from the tissue microenvironment, developing a general-purpose pathology AI system capable of accommodating a wide range of tissue types and evaluation tasks is of paramount importance.
To address these pressing clinical needs, we established the CHIEF model, a general-purpose machine learning framework that provides the foundation for various pathology diagnosis and prediction tasks (Fig. 1a). We leveraged two complementary forms of AI model pretraining: self-supervised pretraining using 15 million pathology image tiles for tile-level feature representation and weakly supervised pretraining on 60,530 WSIs across 19 anatomical sites for tissue context representation. In addition, we devised an efficient framework for tile-level feature aggregation in large-scale WSI analysis. We further validated CHIEF's capability in cancer detection, tumour origin characterization, genomic mutation identification and survival prediction using 32 independent datasets consisting of 19,491 weakly annotated WSIs. Our approach challenges conventional attention-based tile-aggregation methods, offering a holistic representation of WSI features. CHIEF enables systematic microscopic feature identification and lays the groundwork for reliable pathology evaluation.
An overview of CHIEF
We established the CHIEF model, a general-purpose machine learning framework for weakly supervised histopathological image analyses. Unlike commonly used self-supervised feature extractors27,30, CHIEF leveraged two types of pretraining procedure: unsupervised pretraining on 15 million unlabelled tile images and weakly supervised pretraining on more than 60,000 WSIs. Tile-level unsupervised pretraining established a general feature extractor30 for haematoxylin-eosin-stained histopathological images collected from heterogeneous publicly available databases, which captured diverse manifestations of microscopic cellular morphologies. Subsequent WSI-level weakly supervised pretraining constructed a general-purpose model by characterizing the similarities and differences between cancer types. We evaluated the performance of CHIEF in a wide range of pathology evaluation tasks, including cancer detection, tumour origin prediction, genomic profile identification and survival prediction (Fig. 1a). The details of model design and implementation are described in the Methods.
CHIEF augmented cancer cell detection
Detecting malignant cells from pathological images is crucial for cancer diagnoses4,5. State-of-the-art AI methods for cancer cell detection predominantly concentrate on training models for specific cancer types, without leveraging the commonalities of malignant cell morphology across cancers. The resulting models are not easily extensible to other cancer categories. To address this gap, we built a weakly supervised cancer detection platform using CHIEF and evaluated its generalizability across cancers. We conducted an extensive external validation using 15 independent datasets with a total of 13,661 WSIs. These datasets encompass both public (for example, Clinical Proteomic Tumor Analysis Consortium (CPTAC), Diagset-B31, Dataset-patient-level-test (Dataset-PT)32, the Diagnostic Reference Oncology Imaging Database (DROID)-Breast and TissueNet33 cohorts) and institutional (for example, samples from Shenzhen Maternity & Child Healthcare Hospital (SMCH) and Chongqing University Cancer Hospital (CUCH)) data sources, contain biopsy and surgical resection slides and span 11 different primary cancer sites, including the breast, uterus-endometrium, oesophagus, stomach, cervix, colon, prostate, kidney, skin, pancreas and lung. To better assess the performance of CHIEF, we compared it with three weakly supervised WSI classification methods: clustering-constrained-attention multiple instance learning (CLAM)6, attention-based deep multiple instance learning (ABMIL)34 and dual-stream multiple instance learning networks (DSMIL)35.
CHIEF consistently attained superior performance in a variety of cancer identification tasks using either biopsy or surgical resection slides (Fig. 2a). CHIEF achieved a macro-average area under the receiver operating characteristic curve (AUROC) of 0.9397 across 15 datasets representing 11 cancer types (Fig. 2a), which is approximately 10% higher than that attained by DSMIL (a macro-average AUROC of 0.8409), 12% higher than that of ABMIL (a macro-average AUROC of 0.8233) and 14% higher than that of CLAM (a macro-average AUROC of 0.8016). In all five biopsy datasets collected from independent cohorts, CHIEF possessed AUROCs of greater than 0.96 across several cancer types, including oesophagus (CUCH-Eso), stomach (CUCH-Sto), colon (CUCH-Colon) and prostate (Diagset-B and CUCH-Pros). On independent validation with seven surgical resection slide sets spanning five cancer types (that is, colon (Dataset-PT), breast (DROID-Breast), endometrium (SMCH-Endo and CPTAC-uterine corpus endometrial carcinoma (UCEC)), lung (CPTAC-lung squamous cell carcinoma (LUSC)) and cervix (SMCH-Cervix and TissueNet)), CHIEF attained AUROCs greater than 0.90. Both CHIEF and the set of baseline methods had lower performance in CPTAC. Nonetheless, CHIEF significantly outperformed all other methods in cancer cell identification in these datasets (DeLong test P value < 0.001). These results demonstrated CHIEF's generalizability across diverse cancer tissues and samples obtained from heterogeneous sources internationally.
We used whole-slide attention visualization to identify diagnostic signals utilized by the CHIEF models. Figure 2b, Extended Data Fig. 2 and Supplementary Fig. 1 show the original WSIs, pixel-level ground truth annotated by pathologists (Methods) and attention maps output by CHIEF. CHIEF directed most of its attention to cancerous regions, exhibiting a remarkable alignment with ground truth annotations at the pixel level despite being trained only on slide-level labels. Notably, tiles receiving high attention from CHIEF contained tissue with typical cytologic and architectural patterns of malignancy (for example, increased nuclear/cytoplasmic ratio, irregularly shaped nuclei, cellular pleomorphism and disorganized architecture), showing the model's capacity to identify key diagnostic features using a weakly supervised approach.
CHIEF identified tumour origins
We successfully used CHIEF to predict the tissue origin of cancers and validated the results using independent test sets from CPTAC. Extended Data Fig. 1 and Supplementary Tables 5-7 show the detailed results.
CHIEF predicted genomic profiles
Genomic profiles of cancer samples indicate patients' treatment responses and are crucial for formulating treatment plans19. The comprehensive genomic profiling of patients with cancer is not routinely conducted worldwide owing to the additional cost and time involved18. Identifying quantitative morphological patterns indicative of genomic profiles from routine haematoxylin-eosin-stained slides offers an instantaneous and cost-effective alternative to genomic sequencing. We examined CHIEF's capability to systematically predict molecular profiles of cancer samples. We focused on four clinically important prediction tasks: systematic prediction of prevalent genetic mutations across cancer types; identification of mutations related to targeted therapies; isocitrate dehydrogenase (IDH) status prediction for the new WHO (World Health Organization) classification of glioma; and microsatellite instability (MSI) prediction for assessing the benefits of immune checkpoint blockade in patients with colorectal cancer (CRC).
Prevalent genetic mutations
We conducted a systematic analysis that associated prevalent genetic mutations with histopathology images (Fig. 3 and Extended Data Fig. 3). Our study involved 13,432 WSIs across 30 cancer types and 53 genes with the top five highest mutation rates in each cancer type.
CHIEF predicted the mutation status of nine genes with AUROCs greater than 0.8 in our systematic pan-cancer genetic mutation analyses (Fig. 3). Consistent with previous studies18,36, pathology images contain strong signals related to TP53 mutation across 19 cancer types, with high AUROCs in low-grade glioma (LGG; 0.8756; 95% confidence interval (CI) 0.8624-0.8888), adrenal carcinoma (0.8119; 95% CI 0.7488-0.8751) and UCEC (0.8115; 95% CI 0.7971-0.8259). CHIEF also identified mutations in GTF2I, which occur in 43.4% of patients with thymic epithelial tumours37, with an AUROC of 0.9111 (95% CI 0.8935-0.9287). Furthermore, CHIEF predicted BAP1 mutation in uveal melanoma (AUROC = 0.817; 95% CI 0.7668-0.8672), which is observed in approximately 45% of uveal melanoma cases38.
We tested CHIEF in an independent patient cohort from CPTAC. CHIEF consistently maintained similar AUROCs for various genes in these new patient cohorts (Extended Data Fig. 4). Compared with the state-of-the-art method for histopathology-based genomic mutation prediction (that is, the pan-cancer computational histopathology (PC-CHiP) method36; Supplementary Fig. 2), CHIEF showed significantly higher performance (Wilcoxon signed-rank test P value < 0.001), with a macro-average AUROC of 0.7043 (range 0.51-0.89). By contrast, the PC-CHiP method attained a macro-average AUROC of 0.6523 (range 0.39-0.92).
Mutations related to targeted therapies
We further used CHIEF to predict genes associated with FDA (Food and Drug Administration)-approved targeted therapies presented in OncoKB39 (www.oncokb.org) across 18 genes spanning 15 cancer types (Fig. 3). CHIEF predicted the mutation status of all 18 genes with AUROCs greater than 0.6 (Fig. 3). Mutations with high prediction performance included EZH2 in diffuse large B-cell lymphoma (AUROC = 0.9571; 95% CI 0.9321-0.9822), NTRK1 in stomach adenocarcinoma (AUROC = 0.8192; 95% CI 0.7767-0.8618), BRCA2 in prostate adenocarcinoma (AUROC = 0.8938; 95% CI 0.8310-0.9567), BRAF in thyroid carcinoma (AUROC = 0.8889; 95% CI 0.8715-0.9064), ERBB2 in lung squamous cell carcinoma (LUSC; AUROC = 0.8211; 95% CI 0.7597- 0.8826) and FGFR3 in bladder urothelial carcinoma (AUROC = 0.8161; 95% CI 0.7921-0.8402). On independent validation, CHIEF achieved a similar level of performance in the CPTAC cohorts (Extended Data Fig. 4). Among these genes, ESR1 in breast cancer (BRCA), EGFR in lung adenocarcinoma (LUAD) and BRAF in colon adenocarcinoma and rectum adenocarcinoma (COADREAD) all exhibited AUROCs greater than 0.7 in both held-out and independent test sets.
IDH status prediction
The fifth edition of the WHO Classification of Tumors of the Central Nervous System distinguished glioblastoma (GBM) from LGG on the basis of IDH status instead of conventional histological features8,40. Thus, it is crucial to identify patients' IDH status at the time of diagnosis. To identify IDH mutation-related signals independent of histological grades, we stratified our study cohorts by histological grade and used CHIEF to predict IDH status in each stratum. We conducted IDH status prediction analyses on six datasets: The Cancer Genome Atlas (TCGA)-LGG, TCGA-GBM, Medical University of Vienna (MUV)-LGG41, MUV-GBM41, Harvard Medical School and the University of Pennsylvania (HMS)-LGG and HMS-GBM, including a total of 2,718 WSIs. The CHIEF model demonstrated superior performance compared to other baseline methods in both the held-out and independent test sets (Wilcoxon signed-rank test P value < 0.01; Fig. 4a and Supplementary Fig. 3). To increase interpretability, we visualized the quantitative image feature vectors and examined the distribution of attention scores determined by CHIEF (Extended Data Figs. 5 and 9b). Results showed that necrotic regions received significantly higher attention when identifying gliomas with IDH-wild-type status (Mann-Whitney U-test P < 0.0001; Extended Data Fig. 9b).
MSI status prediction
MSI is a well-established biomarker for responses to immune checkpoint blockade in CRCs27. To enable rapid treatment personalization at the time of diagnosis, we examined the performance of CHIEF in predicting MSI status using histopathological images. CHIEF significantly outperformed the best-performing baseline method (DSMIL) in the TCGA-COADREAD dataset and two independent cohorts (PAIP202042 and CPTAC-COAD), with an AUROC improvement of approximately 12%, 15% and 26%, respectively (Fig. 4b). Attention analyses showed that regions containing solid tumours, luminal necrosis and tumour-infiltrating lymphocytes received high attention from CHIEF (Extended Data Fig. 6).
CHIEF predicted survival outcomes
Owing to differential responses to standard treatments, patients with cancer have varying disease-specific survival outcomes after their initial diagnoses43. Although many clinical and genomic biomarkers have been proposed, they do not fully predict the prognosis of every patient. To address this challenge, we extended our CHIEF framework to establish stage-stratified survival prediction models for each cancer type under study. We used a total of 9,404 WSIs in 17 datasets (from both publicly available and institutional sample sources) and focused on 7 cancer types (COADREAD, LUSC, BRCA, GBM, UCEC, LUAD and renal cell carcinoma (RCC)) with reliable prognostic information in the independent cohorts.
CHIEF successfully predicted patients' survival outcomes using the histopathology images obtained at the time of initial diagnosis. In all cancer types and all study cohorts, CHIEF distinguished patients with longer-term survival from those with shorter-term survival (log-rank test P < 0.05; Fig. 5 shows the prediction results of patients with stage I and stage II cancer). In comparison, state-of-the-art deep learning methods (for example, pathology-omics research platform for integrative survival estimation (PORPOISE)12 and DSMIL35) cannot reliably differentiate patients with different survival outcomes in the same settings (log-rank test P > 0.05 in 11 out of 15 cohorts; Supplementary Fig. 4). In addition, the Kaplan-Meier curves produced by CHIEF possessed narrower CIs than other methods. Overall, CHIEF attained an average concordance index (c-index) of 0.74 across cancer types in the held-out test set (Supplementary Table 3), which was 12% and 7% higher than those of PORPOISE and DSMIL (0.62 and 0.67, respectively). Notably, the performance difference between CHIEF and baseline methods was even more pronounced in the independent cohorts not participating in the model development process. In these patient populations, CHIEF attained an average c-index of 0.67 (9% better than all baseline models) and distinguished patients with different survival outcomes in all datasets, whereas PORPOISE and DSMIL yielded average c-indices of 0.54 and 0.58, respectively.
We observed similar performance trends in patients with stage III (Supplementary Fig. 6) and stage IV cancers (Supplementary Fig. 7), with CHIEF outperforming other methods by up to 10%. As some previously published methods focused on mixed-stage results, we computed the results from mixed-stage analyses and showed that CHIEF outperformed baseline methods in these study settings (Extended Data Fig. 7 and Supplementary Fig. 5). In addition, we conducted a multivariate analysis that incorporated model-derived risk score, patient age, sex and stage (Supplementary Tables 9 and 10). Results showed that CHIEF-derived risk score is a significant prognostic factor independent of known indicators of survival outcomes. Furthermore, our univariate analysis showed that CHIEF-derived risk scores are statistically significantly associated with survival outcomes across all cancer types in all patient cohorts under investigation (Supplementary Tables 11 and 12). In comparison, the risk scores predicted by other pathology imaging-based methods cannot differentiate patients' survival outcomes in most patient cohorts using either multivariate or univariate analyses.
To better understand the histological features indicative of patients' survival outcomes, four attending pathologists independently reviewed the attention heat maps generated by CHIEF (Methods). In both longer-term survivors and shorter-term survivors, high-attention areas contained malignant tissues across cancer types (Extended Data Figs. 8 and 9 and Supplementary Figs. 8 and 9). High-attention areas for longer-term survivors had more infiltrating immune cells than those for patients with higher mortality risks. In cancer samples from shorter-term survivors, high-attention regions exhibited larger nuclear/ cytoplasmic ratios, more pronounced nuclear atypia, less stromal fibrosis, and weak intercellular adhesion.
Discussion
We developed CHIEF as a general-purpose, pan-cancer foundation deep learning framework for quantitative pathology evaluation. CHIEF leveraged unsupervised tile-level pretraining, weakly supervised WSI-level pretraining and 44 terabytes of histopathology imaging data from several countries for robust pathology image analysis. The CHIEF framework successfully characterized tumour origins, predicted clinically important genomic profiles, and stratified patients into longer-term survival and shorter-term survival groups. Furthermore, our approach established a general pathology feature extractor capable of a wide range of prediction tasks even with small sample sizes. Our results showed that CHIEF is highly adaptable to diverse pathology samples obtained from several centres, digitized by various scanners, and obtained from different clinical procedures (that is, biopsy and surgical resection). This new framework substantially enhanced model generalizability, a critical barrier to the clinical penetrance of conventional computational pathology models1,3.
CHIEF effectively leveraged anatomic site information as a source of prior knowledge and considered the contextual interactions across different image regions in the WSIs, contributing to substantially better generalizability than standard approaches. We successfully used the CHIEF framework in various WSI-level prediction tasks, and our models achieved superior performance compared to state-of-the-art methods. For example, CHIEF exhibited a robust ability to recognize the origins of the primary tumours in patient cohorts not involved in the training process.
In addition, CHIEF substantially outperformed baseline methods in predicting genomic variations using pathology imaging profiles36. In particular, CHIEF predicted the mutation status of several oncogenes and tumour suppressors with higher performance (AUROCs > 0.8), such as TP53, GTF2I, BTG2, CIC, CDH1, IGLL5 and NRAS. As the updated WHO diagnostic guidelines incorporated molecular markers in tumour classifications, we further showed that CHIEF predicted key mutations related to major diagnostic categories and validated the results in several patient populations. CHIEF also accurately predicted the MSI status of patients with CRC, which may facilitate clinical decisions regarding the administration of immune checkpoint inhibitors18,19,27. Finally, imaging features extracted by CHIEF served as the foundation for survival outcome prediction models. These models stratified patients into high- and low-mortality risk groups across all cancer types under study, and the results were validated in 17 cohorts.
We further interpreted CHIEF models by visualizing imaging regions that received high attention from the model. CHIEF used a weakly supervised machine learning approach, which identified the regions of interest automatically by comparing positive and negative examples, thereby eliminating the need for pixel-level or region-level annotations. This approach made it possible to leverage large-scale publicly available and institutional datasets to capture the heterogeneity of pathology manifestations across thousands of samples. For example, visualization of survival outcome prediction models indicated that samples from patients with cancer with lower mortality risks contain more infiltrating immune cells and abundant stroma with clear glandular and cribriform structures.
Last, we showed that CHIEF outperformed recently released generalpurpose foundation models and patch-based pathology foundation models with statistically significant performance differences26,44-46 Supplementary Fig. 10 and Supplementary Tables 25 and 26). The additional weakly supervised pretraining approach leveraging large-scale WSI datasets probably contributed to its enhanced performance.
Our study has a few limitations. First, although CHIEF was trained with a large number of samples collected from several hospitals and study cohorts worldwide, the inclusion of a larger number of non-malignant slides and slides from rare diseases could further improve the performance of our general-purpose pathology feature extractor. In addition, our prognostic prediction models focused on the disease-specific and overall survival prediction of patients receiving standard care. Future research can extend our methods to study the predicted benefits and adverse effects of new cancer treatments.
In conclusion, CHIEF is a foundation model useful for a wide range of pathology evaluation tasks across several cancer types. We have demonstrated the generalizability of this foundation model across several clinical applications using samples collected from 24 hospitals and patient cohorts worldwide. CHIEF required minimal image annotations and extracted detailed quantitative features from WSIs, which enabled systematic analyses of the relationships among morphological patterns, molecular aberrations and important clinical outcomes. Accurate, robust and rapid pathology sample assessment provided by CHIEF will contribute to the development of personalized cancer management.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-024-07894-z.
1. Van der Laak, J., Litjens, G. & Ciompi, F. Deep learning in histopathology: the path to the clinic. Nat. Med. 27, 775-784 (2021).
2. Shmatko, A., Ghaffari Laleh, N., Gerstung, M. & Kather, J. N. Artificial intelligence in histopathology: enhancing cancer research and clinical oncology. Nat. Cancer 3, 1026-1038 (2022).
3. Song, A. H. et al. Artificial intelligence for digital and computational pathology. Nat. Rev. Bioeng. 1, 930-949 (2023).
4. Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301-1309 (2019).
5. Bejnordi, B. E. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199-2210 (2017).
6. Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on wholeslide images. Nat. Biomed. Eng. 5, 555-570 (2021).
7. Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559-1567 (2018).
8. Nasrallah, M. P. et al. Machine learning for cryosection pathology predicts the 2021 WHO classification of glioma. Med 4, 526-540 (2023).
9. Tsai, P.-C. et al. Histopathology images predict multi-omics aberrations and prognoses in colorectal cancer patients. Nat. Commun. 14, 2102 (2023).
10. Yu, K.-H. et al. Classifying non-small cell lung cancer types and transcriptomic subtypes using convolutional neural networks. J. Am. Med. Inform. Assoc. 27, 757-769 (2020).
11. Yu, K.-H. et al. Association of omics features with histopathology patterns in lung adenocarcinoma. Cell Syst. 5, 620-627 (2017).
12. Chen, R. J. et al. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 40, 865-878 (2022).
13. Marostica, E. et al. Development of a histopathology informatics pipeline for classification and prediction of clinical outcomes in subtypes of renal cell carcinoma. Clin. Cancer Res. 27, 2868-2878 (2021).
14. Yu, K.-H. et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat. Commun. 7, 12474 (2016).
15. Vanguri, R. S. et al. Multimodal integration of radiology, pathology and genomics for prediction of response to PD-(L)1 blockade in patients with non-small cell lung cancer. Nat. Cancer 3, 1151-1164 (2022).
16. Yu, K.-H. et al. Deciphering serous ovarian carcinoma histopathology and platinum response by convolutional neural networks. BMC Med. 18, 236 (2020).
17. Foersch, S. et al. Multistain deep learning for prediction of prognosis and therapy response in colorectal cancer. Nat. Med. 29, 430-439 (2023).
18. Kather, J. N. et al. Pan-cancer image-based detection of clinically actionable genetic alterations. Nat. Cancer 1, 789-799 (2020).
19. Echle, A. et al. Deep learning in cancer pathology: a new generation of clinical biomarkers. Br. J. Cancer 124, 686-696 (2021).
20. Ektefaie, Y. et al. Integrative multiomics-histopathology analysis for breast cancer classification. NPJ Breast Cancer 7, 147 (2021).
21. Yu, K.-H., Beam, A. L. & Kohane, I. S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2, 719-731 (2018).
22. Krishnan, R., Rajpurkar, P. & Topol, E. J. Self-supervised learning in medicine and healthcare. Nat. Biomed. Eng. 6, 1346-1352 (2022).
23. Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156-1638 (2023).
24. Chen, C. et al. Fast and scalable search of whole-slide images via self-supervised deep learning. Nat. Biomed. Eng. 6, 1420-1434 (2022).
25. Wang, X. et al. RetCCL: clustering-guided contrastive learning for whole-slide image retrieval. Med. Image Anal. 83, 102645 (2023).
26. Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850-862 (2024).
27. Wagner, S. J. et al. Transformer-based biomarker prediction from colorectal cancer histology: a large-scale multicentric study. Cancer Cell 41, 1650-1661 (2023).
28. Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual-language foundation model for pathology image analysis using medical twitter. Nat. Med. 29, 2307-2316 (2023).
29. Lu, M. Y. et al. A visual-language foundation model for computational pathology. Nat. Med. 30, 863-874 (2024).
30. Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022).
31. Koziarski, M. et al. Diagset: a dataset for prostate cancer histopathological image classification. Sci. Rep. 14, 6780 (2024).
32. Yu, G. et al. Accurate recognition of colorectal cancer with semi-supervised deep learning on pathological images. Nat. Commun. 12, 6311 (2021).
33. Loménie, N. et al. Can AI predict epithelial lesion categories via automated analysis of cervical biopsies: the TissueNet challenge? J. Pathol. Inform. 13, 100149 (2022).
34. Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2127-2136 (PMLR, 2018).
35. Li, B., Li, Y. & Eliceiri, K. W. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In Proc. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition 14313-14323 (IEEE, 2021).
36. Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat. Cancer 1, 800-810 (2020).
37. Petrini, I. et al. A specific missense mutation in GTF2I occurs at high frequency in thymic epithelial tumors. Nat. Genet. 46, 844-849 (2014).
38. Carbone, M. et al. Biological mechanisms and clinical significance of BAP1 mutations in human cancer. Cancer Discov. 10, 1103-1120 (2020).
39. Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precision Oncology 1, 1-16 (2017).
40. Louis, D. N. et al. The 2021 WHO Classification of Tumors of the Central Nervous System: a summary. Neuro-Oncology 23, 1231-1251 (2021).
41. Roetzer-Pejrimovsky, T. et al. The Digital Brain Tumour Atlas, an open histopathology resource. Sci. Data 9, 55 (2022).
42. Kim, K. et al. PAIP 2020: microsatellite instability prediction in colorectal cancer. Med. Image Anal. 89, 102886 (2023).
43. Amin, M. B. et al. The Eighth Edition AJCC Cancer Staging Manual: continuing to build a bridge from a population-based to a more "personalized" approach to cancer staging. CA Cancer J. Clin. 67, 93-99 (2017).
44. Achiam, J. et al. GPT-4 technical report. Preprint at https://doi.org/10.48550/arXiv.2303. 08774 (2023).
45. Team, G. et al. Gemini: a family of highly capable multimodal models. Preprint at https:// doi.org/10.48550/arXiv.2312.11805 (2023).
46. Azizi, S. et al. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat. Biomed. Eng. 7, 756-779 (2023).
47. Cancer Genome Atlas Research Network, J. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113-1120 (2013).
48. Lonsdale, J. et al. The genotype-tissue expression (GTEx) project. Nat. Genet. 45, 580-585 (2013).
49. Bulten, W. et al. Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge. Nat. Med. 28, 154-163 (2022).
50. Yacob, F. et al. Weakly supervised detection and classification of basal cell carcinoma using graph-transformer on whole slide images. Sci Rep. 13, 7555 (2023).
51. Xu, F. et al. Predicting axillary lymph node metastasis in early breast cancer using deep learning on primary tumor biopsy slides. Front. Oncol. 11, 4133 (2021).
52. Weitz, P. et al. A multi-stain breast cancer histological whole-slide-image data set from routine diagnostics. Sci. Data 10, 562 (2023).
53. Wang, C.-W. et al. Histopathological whole slide image dataset for classification of treatment effectiveness to ovarian cancer. Sci. Data 9, 25 (2022).
54. Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 8748-8763 (PMLR, 2021).
55. Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. Syst. 9, 62-66 (1979).
56. Kingma, D. P. & Ba, J. L. Adam: a method for stochastic optimization. In Proc. 3rd International Conference on Learning Representations (eds Bengio, Y. & LeCun, Y.) (ICLR, 2015).
57. Loshchilov, I. & Hutter, F. SGDR: stochastic gradient descent with warm restarts. In Proc. 5th International Conference on Learning Representations 1769-1784 (ICLR, 2017).
58. Stadler, C. B. et al. Proactive construction of an annotated imaging database for artificial intelligence training. J. Digit. Imaging 34, 105-115 (2021).
59. Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106-110 (2021).
60. Black, A. et al. PLCO: evolution of an epidemiologic resource and opportunities for future studies. Rev. Recent Clin. Trials 10, 238-245 (2015).
61. Shao, Z. et al. TransMIL: Transformer based correlated multiple instance learning for whole slide image classification. Adv. Neural Inf. Process. Syst. 34, 2136-2147 (2021).
62. Liang, J. et al. Deep learning supported discovery of biomarkers for clinical prognosis of liver cancer. Nat. Mach. Intell. 5, 408-420 (2023).
63. Courtiol, P. et al. Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nat. Med. 25, 1519-1525 (2019).
64. Graham, S. et al. Hover-Net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 58, 101563 (2019).
Copyright Nature Publishing Group Oct 24, 2024