Abstract

In cancer, the primary tumour's organ of origin and histopathology are the strongest determinants of its clinical behaviour, but in 3% of the time a cancer patient presents with metastatic tumour and no obvious primary. Challenges also arise when distinguishing a metastatic recurrence of a previously treated cancer from the emergence of a new one. Here we train a deep learning classifier to predict cancer type based on patterns of somatic passenger mutations detected in whole genome sequencing (WGS) of 2606 tumours representing 24 common cancer types. Our classifier achieves an accuracy of 91% on held-out tumor samples and 82% and 85% respectively on independent primary and metastatic samples, roughly double the accuracy of trained pathologists when presented with a metastatic tumour without knowledge of the primary. Surprisingly, adding information on driver mutations reduced classifier accuracy. Our results have immediate clinical applicability, underscoring how patterns of somatic passenger mutations encode the state of the cell of origin, and can inform future strategies to detect the source of cell-free circulating tumour DNA.

Footnotes

* Since the original version, we have revised the paper dramatically by using deep learning to predict cancer type based on somatic mutation pattern in place of the original random forest classifier. The result is a substantial improvement in classifier accuracy. We have also added a large independent validation set of metastatic tumours, and show that the performance of the classifier on metastases is equal, or better to, its performance on primaries. This strengthens the case for applying the system to realistic clinical challenges.

Details

Title
A deep learning system can accurately classify primary and metastatic cancers based on patterns of passenger mutations
Author
Jiao, Wei; Atwal, Gurnit; Polak, Paz; Karlic, Rosa; Cuppen, Edwin; Danyi, Alexandra; De Ridder, Jeroen; Carla Van Herpen; Lolkema, Martijn P; Steeghs, Neeltje; Getz, Gad; Morris, Quaid D; Stein, Lincoln D; Pcawg Pathology & Clinical Correlates Working Grp; Icgc/tcga Pan-Cancer Analysis Of Whole Genomes Net
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2019
Publication date
Jan 22, 2019
Publisher
Cold Spring Harbor Laboratory Press
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
ProQuest document ID
2071247284
Copyright
© 2019. This article is published under http://creativecommons.org/licenses/by-nd/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.