Content area
Full Text
Although cancer research nowadays increasingly relies on molecular profiling based on omics technologies, it is becoming increasingly clear that certain aspects of tumour pathology need to be studied within tissue context1. This development is based on the insight that cancer progression is tightly linked to intercellular cross-talk and the interaction of the neoplastic cell with the surrounding microenvironment including the immune system2. While microscopic techniques allow the study of biological processes with high spatial detail, relatively few molecular markers can be microscopically measured simultaneously despite recent advances3,4 compared with what is possible with omics analyses of bulk tissue. Moreover, the relationships between morphological tumour features and molecular profiling data with no or only very limited spatial resolution are largely unexplored. It is therefore desirable to find ways to bridge the gap between microscopic imaging approaches and high-dimensional omics technologies, thereby facilitating the discovery of localized molecular features that drive the spatially heterogeneous phenotype of a tumour.
Here, we present a machine-learning-based analysis that facilitates the integration of molecular and spatiomorphological information in breast cancer (Fig. 1). The method can robustly identify cancer tissue and tumour-infiltrating lymphocytes (TiLs) from histological images and is also capable of predicting molecular features including protein (PROT) and gene expression (RNASEQ), somatic mutations (SOM), copy number variations (CNV) and gene methylation (METH) as well as clinicopathological parameters such as hormone receptor status, tumour grading and survival. Due to their complex nonlinear structure, machine-learning models are often considered ‘black boxes’ as they do not provide information on how specific predictions are made5. As this is problematic in medical applications, we use a method from explainable artificial intelligence allowing for the identification of those image regions/pixels contributing most to the machine-learning-based predictions. The method based on layer-wise relevance propagation (LRP)6 is capable of identifying cell and tissue properties with high spatial resolution, such as cancer cells or TiLs in data from different imaging modalities (brightfield, fluorescent/confocal). Moreover, training our method with image and molecular data from a large, novel in-house database (B-CIB, Berlin Cancer Image Base) and the Cancer Genome Atlas (TCGA)7 facilitates the simultaneous prediction of multiple molecular features in breast cancer. From a set of over 60,000 tested molecular features (data...