Abstract

Existing cancer benchmark data sets for human sequencing data use germline variants, synthetic methods, or expensive validations, none of which are satisfactory for providing a large collection of true somatic variation across a whole genome. Here we propose a data set, Lineage derived Somatic Truth (LinST), of short somatic mutations in the HT115 colon cancer cell-line, that are validated using a known cell lineage that includes thousands of mutations and a high confidence region covering 2.7 gigabases per sample.

Megan Shand et al. present Lineage derived Somatic Truth (LinST), a validated data set of somatic mutations from a colon cancer cell line with a known lineage tree structure. They show that LinST can be used to benchmark true-positive and false-positive rates in somatic variant-calling pipelines applied to cancer genomic data.

Details

Title
A validated lineage-derived somatic truth data set enables benchmarking in cancer genome analysis
Author
Shand, Megan 1   VIAFID ORCID Logo  ; Soto, Jose 1 ; Lichtenstein, Lee 1 ; Benjamin, David 1 ; Farjoun Yossi 1   VIAFID ORCID Logo  ; Brody, Yehuda 2 ; Maruvka Yosef 3 ; Blainey, Paul C 4   VIAFID ORCID Logo  ; Banks, Eric 1 

 Broad Institute of Harvard and MIT, Cambridge, USA (GRID:grid.66859.34) 
 Broad Institute of Harvard and MIT, Cambridge, USA (GRID:grid.66859.34); Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, USA (GRID:grid.66859.34) 
 Broad Institute of Harvard and MIT, Cambridge, USA (GRID:grid.66859.34); MGH Cancer Center and Department of Pathology, Boston, USA (GRID:grid.32224.35) (ISNI:0000 0004 0386 9924) 
 Broad Institute of Harvard and MIT, Cambridge, USA (GRID:grid.66859.34); MIT Department of Biological Engineering, Cambridge, USA (GRID:grid.116068.8) (ISNI:0000 0001 2341 2786); Koch Institute for Integrative Cancer Research at MIT, Cambridge, USA (GRID:grid.116068.8) (ISNI:0000 0001 2341 2786) 
Publication year
2020
Publication date
2020
Publisher
Nature Publishing Group
e-ISSN
23993642
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2473291731
Copyright
© The Author(s) 2020. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.