ARTICLE
Received 3 May 2016 | Accepted 15 Feb 2017 | Published 21 Apr 2017
DOI: 10.1038/ncomms14944 OPEN
Phylogenetic analysis of metastatic progressionin breast cancer using somatic mutations and copy number aberrations
David Brown1,*, Dominiek Smeets2,3,*, Borbla Szkely4,*, Denis Larsimont5, A Marcell Szsz4, Pierre-Yves Adnet1, Franoise Roth1, Ghizlane Rouas1, Zsa I. Nagy4, Zsa Farag4, Anna-Mria T+
Magdolna Dank7, Gyngyvr Szentmrtoni7, Nra Udvarhelyi8, Gabriele Zoppoli9, Lajos Pusztai10, Martine Piccart11, Janina Kulka4, Diether Lambrechts2,3, Christos Sotiriou1,** & Christine Desmedt1,**
Several studies using genome-wide molecular techniques have reported various degrees of genetic heterogeneity between primary tumours and their distant metastases. However, it has been difcult to discern patterns of dissemination owing to the limited number of patients and available metastases. Here, we use phylogenetic techniques on data generated using whole-exome sequencing and copy number proling of primary and multiple-matched metastatic tumours from ten autopsied patients to infer the evolutionary history of breast cancer progression. We observed two modes of disease progression. In some patients, all distant metastases cluster on a branch separate from their primary lesion. Clonal frequency analyses of somatic mutations show that the metastases have a monoclonal origin and descend from a common metastatic precursor. Alternatively, multiple metastatic lesions are seeded from different clones present within the primary tumour. We further show that a metastasis can be horizontally cross-seeded. These ndings provide insights into breast cancer dissemination.
1 Breast Cancer Translational Research Laboratory, Institut Jules Bordet, Universit Libre de Bruxelles, Bld de Waterloo 121, 1000 Brussels, Belgium.
2 Laboratory of Translational Genetics, Vesalius Research Center, VIB, Campus Gasthuisberg, O&N IV Herestraat 49, 3000 Leuven, Belgium. 3 Laboratory of Translational Genetics, Department of Oncology, Katholieke Universiteit Leuven, O&N IV Herestraat 49, 3000 Leuven, Belgium. 4 Second Department of Pathology, Semmelweis University,ll+oi t 93, 1091 Budapest, Hungary. 5 Department of Pathology, Institut Jules Bordet, Bld de Waterloo 121, 1000 Brussels, Belgium. 6 2nd Department of Pathology, MTA-SE Tumor Progression Research Group, Semmelweis University,ll+oi t 93, 1091 Budapest, Hungary.
7 Semmelweis University Cancer Center, Semmelweis University, Tm+
o u. 25-29, 1083 Budapest, Hungary. 8 Surgical and Molecular Tumor Pathology Centre,
National Institute of Oncology, Rth Gyrgy u. 7-9, 1122 Budapest, Hungary. 9 University of Genova and Istituto di Cura a Carattere Clinico e Scientico Azienda Ospedaliera Universitaria San MartinoInstituto Nazionale Tumori, Largo Rosanna Benzi 10, 16132 Genoa, Italy. 10 Yale University, Cedar Street 333, New Haven, Connecticut 05620, USA. 11 Department of Medical Oncology, Institut Jules Bordet, Universit Libre de Bruxelles, Bld de Waterloo 121, 1000 Brussels, Belgium. * These authors contributed equally to this work. ** These authors jointly supervised this work. Correspondence and requests for materials should be addressed to C.S. (email: mailto:[email protected]
Web End [email protected] ) or to C.D. (email: mailto:[email protected]
Web End [email protected] ).
NATURE COMMUNICATIONS | 8:14944 | DOI: 10.1038/ncomms14944 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications 1
oks4,6,
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14944
Cancer-related mortality is almost always due to metastatic dissemination of the primary disease. While research continues to unravel the molecular underpinnings of the
metastatic cascade, it is increasingly recognized that proling of advanced disease could help elucidate such biological phenomena as distant recurrence and the emergence of de novo resistance to therapy.
A handful of studies using genome-wide molecular techniques have begun to explore the clonal relationships between primary and matched metastatic tumours in diverse types of neoplasia including pancreatic1,2, clear-cell renal cell3, high-grade serous ovarian46 and prostate cancer7,8. Despite the small cohort sizes and, too often, a limited number of matched metastases for each patient, these pioneering efforts brought forth thought-provoking ndings such as the rst quantitative model of cancer progression from onset of the founder mutation to metastatic dissemination2, the occurrence of organ specic lineages1, monoclonal38, as well as its counterpart, polyclonal seeding7,8, horizontal cross-seeding between distant metastases6,8, and nally homing of metastatic cells to the primary tumour bed7.
While yet other studies continue to highlight the potential of genomic analyses from small cohort sizes to decipher the origins of intra-tumour heterogeneity and its contribution to metastatic dissemination9,10, in-depth knowledge is currently lacking for breast cancer. Several studies have tackled this issue1119. However, while early attempts were constrained by the development of high throughput genomic techniques, more recent endeavours were, on the other hand, limited in scope by the availability of multiple-matched metastases. Noteworthy exceptions are the work of Juric et al.15 and Murtaza et al.16, both n-of-1 fast autopsy studies, where the authors uncovered the mechanisms of resistance to a PI3K-inhibitor and lapatinib, respectively. Despite this, it remains difcult to discern any pattern of metastatic progression due to the small number of patients.
To further investigate breast cancer progression, we applied phylogenetic techniques on data generated using whole-exome sequencing, custom ultra-deep resequencing and copy number proling. The primary tumours and their associated metastases were obtained from ten autopsied patients. We observed two modes of metastatic progression. In the majority of cases, all distant metastases cluster on a branch separate from their primary lesion. Clonal frequency analyses of somatic mutations show that the metastases have a monoclonal origin and descend from a common metastatic precursor. Alternatively, the primary tumour is clustered alongside metastases with early branches leading to distant organs. Finally, we show that a distant metastasis can be horizontally cross-seeded conrming previous results observed in other types of neoplastic disorders6,8 and lending further support to the self-seeding hypothesis20.
ResultsCharacteristics of patients and samples. We reviewed the database of the institutional autopsy programme of the second Department of Pathology at Semmelweis University. From 50 deceased metastatic breast cancer patients, whose corpses underwent autopsy between 2001 and 2012, ten patients for whom 41 mg double-stranded DNA from the primary breast tumour, a non-cancerous tissue as germline reference, and at least one metastatic sample was available, were selected. Eight patients were diagnosed with early stage disease among whom, one was diagnosed with a single liver metastasis (5/87). Three patients (3/92, 5/87 and 6/91) received neoadjuvant chemotherapy before surgery while the remaining ve patients
(4/71, 7/67, 8/82, 9/68 and 10/80) were treated with breast surgery followed by adjuvant systemic therapy according to standard of care. The remaining two patients (1/69 and 2/57) were diagnosed with de novo metastatic disease and deceased before receiving any systemic or surgical treatment. The patient clinico-pathological characteristics are provided in Supplementary Data 1 while the clinical history and autopsy ndings are detailed in Supplementary Notes 110 corresponding to patient 1/69 to 10/80. The lesions proled are described in Supplementary Data 2. All samples from the de novo metastatic patients were collected post-mortem while, for the remaining patients, the primary tumours were collected at surgery and the distant metastases, in addition to one case of local recurrence, were collected at autopsy. On average, three distant metastatic lesions were proled per patient.
Indexing of somatic mutations and copy number aberrations. We used whole-exome sequencing to index somatic mutations from 51 samples (median coverage 4018 ) followed by
orthogonal validation using Sequenom MassARRAY to exclude false positive calls and targeted amplicon ultra-deep sequencing (median coverage 11,3905,646 ) to obtain accurate variant
allele frequencies (VAFs). The list of single nucleotide variants (SNVs) from each patient is provided in Supplementary Data 3. We supplemented this with high density single nucleotide polymorphism (SNP) arrays to characterize the underlying copy number aberrations (CNAs) in 64 matched samples (Supplementary Data 4). We further devised a multiple tier system to ascribe a condence level to each indexed mutation. Between 27 and 305 non-synonymous SNVs per patient were successfully validated up to tier-3 level and after applying dened quality criteria, a total of 56 samples with either CNA or sequencing data remained for downstream analysis.
Phylogenetic reconstruction of metastatic progression. Metastases are clonally related and originate from cells disseminated at various stages of the disease. Thus, they inherit varying fractions of genomic alterations from their parental lineage, followed by acquisition of private alterations. Provided the genomic alterations under investigation are fully clonal, phylogenetic inference can be used to investigate lineage tracing of metastases within a patient. Therefore, we used a maximum parsimony criterion to infer the sequence of genomic alterations occurring during metastatic progression. Figure 1af illustrates the results obtained in patient 2/57. Whenever the two phylogenies obtained from SNVs and CNAs were consistent, these were graphically represented as a combined tree. In the case of SNV-based phylogenies containing unresolved nodes, so called soft polytomies, we used the corresponding tree generated from CNAs as the correct phylogeny on account of the greater number of aberrations from SNP arrays, allowing for a unique solution to tree reconstruction.
The combined use of SNVs and CNAs demonstrated the presence of reversions, that is, SNVs predicted as present in a sample from the ancestral state reconstruction but were not detected in the particular sample. For the predicted reversions, we excluded the possibility that these were due to false negative calls attributable to inadequate sequencing coverage depth or the occurrence of the mutations at subclonal cell frequencies based on power calculations described in Carter et al.21 (Supplementary Fig. 1). Instead, Fig. 2ai shows two clusters of reversions in the metastasis to the pylorus from patient 8/82, which can be attributed to loss of heterozygosity at chromosome 1p and 17p in that lesion. We further encountered a similar phenomenon in patient 9/68 (Supplementary Fig. 2) where the
2 NATURE COMMUNICATIONS | 8:14944 | DOI: 10.1038/ncomms14944 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14944 ARTICLE
a b c
P (primary)
M1 (liver)
M3 (ovarium)
1
1
1
0.5
0.5
0.5
BAF
3 CN
Log ratio
0
0
0
h
j
g
k l
d
i
a
b
c
e f
h
j
g
k l
d
i
a
b
c
e f
h
j
l
g
k
d e
i
c
a
b
f
4
4
4
3
3
2
2
2
1
1
1
1
1
1
0
0
0
1
1
1
Position (index)
Position (index)
Position (index)
d e f
M3 P
M1
P
M3 M1
c>e+ (fst:+1)
d+ (fst:+1)
Ancestral state
i (fst:1)
M1
h>j+ (fst:+1)
Common ancestor/ metastatic precursor
P
h>l+ (fst:+1)
Ancestral state
g+ (fst:+1)
M3
b+ (fst:+1)
Common ancestor/ metastatic precursor
a>f (fst:1)
Normal Normal
Normal
Figure 1 | Phylogenetic inference from SNVs and CNAs. Fully clonal somatic CNAs of chromosome 3 for (a) the primary tumour, (b) the liver and(c) the ovarian metastases of patient 2/57. The tracks are in descending order, the B Allele Frequency (BAF), integer copy number (CN) state and log2 ratios. The phylogenetic reconstruction is displayed in d, where arrows indicate aberrations with annotations besides detailing coordinates and event type. Convergent evolution is exemplied by the focal amplication of region d in a,b but a broad gain of ce in c, in all three cases leading to copy-neutral loss of heterozygosity of region d. The CNAs are colour coded with early events in blue, late events in orange, and diploid regions not contributing to the phylogenetic tree in black. (e) Concordant phylogeny obtained from tier-3 SNVs and (f) ancestral state reconstruction for the same samples. The scale bars in (d,e) represent one CNA and 10 SNVs, respectively.
change in copy number status of chromosome 19p can be tracked across the phylogenetic tree and explains the reversion of the mutation in F2RL3. The occurrence of reversions has seldom been acknowledged in the literature and despite growing interest in the inference of phylogenies from single or multiple samples, several approaches have falsely relied on the hypothesis of no back mutation. Therefore, these examples serve as cautionary tale when performing such analysis without properly matched sequencing and CNA data.
Disease dissemination via a metastatic precursor. We applied the same workow to all patients for whom both SNV and CNA data were available. We refer to early and late alterations when occurring in the trunk or the branches of the phylogenetic trees, respectively. A representative combined phylogeny for patient 7/67 is illustrated in Fig. 3. This patient was diagnosed at the age of 54 with an ER-/PgR-/HER2-primary breast cancer. She deceased 3 years later despite surgery and several lines of systemic treatments. All the distant metastases clustered together and descended from what we refer to as the metastatic precursor. We computed the clonal frequencies from the VAFs, the global cancer cell fraction (CCF) and the copy number states for each SNV. These are represented as pairwise comparisons of samples (Fig. 3d). Similarly, the phylogenies of patients with early breast cancer disease (4/71, 8/82, 9/68 and 10/80), and the one from patient 5/87 with a single liver metastasis at initial diagnosis, further conrmed the case of patient 7/67 (Fig. 5a; Supplementary Figs 4 and 5). Distant metastases probably arose via a seeding event to an initial metastatic precursor from the primary tumour and in absence of the latter,
removed at surgery, the source of further dissemination to additional organs occurred by metastasis-to-metastasis disseminations. Our observation suggests that for breast cancer patients diagnosed at an early stage and undergoing curative intent surgery, who represent the majority of patients, cascading disseminations from metastases appears to be a major route of tumour progression.
For seven patients from our cohort, multiple samples from the primary tumour were available (3/92, 4/71, 5/87, 6/91, 8/82, 9/68 and 10/80). In two cases, a particular region of the primary tumour was more genetically related to the distant metastases. For patient 3/92, the phylogenetic tree based on CNAs (Supplementary Fig. 9) show that the primary tumour sample (P3) was clustered alongside the metastases to the liver (M2) and pancreatic lymph node (M3) while for patient 6/91, the tree inferred from tier-3 SNVs (Supplementary Fig. 7) shows that the primary tumour sample (P3) is clustered alongside the metastases to the brain (M1) and liver (M2). Apart from these two exceptions, in all the other patients, the different samples from the primary tumour were clustered together separate from their associated distant metastases (Fig. 2; Supplementary Figs 7 and 9).
Multiple seeding events from the primary tumour. A contrasting clinical and biological condition to the dissemination via a metastatic precursor is illustrated by the case of patient 2/57 (Fig. 4). This patient was a BRCA1 germline mutation carrier diagnosed at the age of 38 with an ER-/PgR-/HER2-metastatic breast cancer. She did not receive any systemic treatment and deceased 1 month after initial diagnosis. Analysis
NATURE COMMUNICATIONS | 8:14944 | DOI: 10.1038/ncomms14944 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications 3
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14944
a b c d
P3 (primary)
M2 (pylorus)
M3 (liver)
1
1
1
M2
0.5
0.5
0.5
BAF
CN
Log ratio
M4
0
0
0
M3
34
34
4
P3
3
12
2
P2
2/1
12
1
P1
1
1
1
0
0
0
N
1
1
1
Position (index)
Position (index)
e
2
f h i
P3 (primary)
M2 (pylorus)
M3 (liver)
M2
1
1
1
M4
0.5
0.5
0.5
M3
0
0
0
P3
3/2
34
34
3/2
34
P2
12
12
12
P1
1
1
1
0
0
0
Log
N
1
1
1
Position (index)
Position (index)
Position (index)
Figure 2 | Reversions in SNVs are explained by underlying CNAs. (a,f) Early and late tier-3 SNVs, respectively, which were predicted to be reversions in the metastasis to the pylorus of patient 8/82. These were clustered on chromosome 1p and 17p. (bd, gi) Fully clonal somatic CNAs of chromosome 1 and 17, respectively, for the primary tumour, the pylorus and the liver metastases ordered according to their genomic coordinates. In each panel, the tracks displayed are in descending order, the BAF, the integer based estimation of CN and the log2 ratios. (e) Heat map representing the ancestral state reconstruction. The loss of heterozygosity at chromosome arm 1p and 17p in M2 explains the absence of these mutations.
of her primary tumour (P) and two distant metastases revealed two independent seeding events from the primary leading to the ovarian (M3) and to the liver (M1) secondary lesions, respectively (Fig. 1d,e). The phylogenetic reconstruction from the CNA prole of the metastasis to the adrenal gland (M2) revealed that this lesion originated from a precursor shared with the liver metastasis (Fig. 4b). However, the adrenal gland lesion displayed both SNVs acquired late in the evolutionary history of the clade composed of the primary tumour and liver metastasis as well as SNVs private to the ovarian metastasis. Pairwise comparisons of the clonal frequencies of tier-4 SNVs showed that those private to the primary tumour and liver metastasis clade (segment 4) were also present at full clonal frequencies in the adrenal gland metastasis (Fig. 4c) in agreement with the phylogeny inferred from the CNA proles. The late SNVs private to the ovarian metastasis (segment 2) were observed at subclonal frequencies in the adrenal gland metastasis. We resequenced M2 and obtained similar results (Supplementary Fig. 3). Our results imply that circulating metastatic cells, disseminated by the ovarian metastasis, horizontally cross-seeded the already metastatic adrenal gland and conrm previous observations in ovarian6 and prostate8 cancers further lending support to the hypothesis of tumour self-seeding20.
In the additional advanced stage breast cancer patient (1/69) who was de novo metastatic and died in the weeks following her diagnosis without receiving any systemic treatment,
the primary sample was also found clustered alongside distant metastases (Fig. 5b; Supplementary Fig. 6). We observed a similar early seeding to distant organs (primary to pleura in 1/69 and primary to ovarium in 2/57) followed by subsequent late seeding events to additional organs from either the primary lesion (primary to aorta in 1/69 and primary to liver in 2/57) or from already established metastases (mediastinal soft tissue to mediastinal lymph node or vice versa in 1/69 and liver to adrenal gland or vice versa in 2/57).
Contralateral breast tumours originate from primary tumours. Two patients from our series were diagnosed with a metachronous contralateral breast tumour. Patient 6/91, 1 year and 10/80, 10 years after initial diagnosis. In patient 6/91, the phylogenetic reconstruction based on tier-3 SNVs showed that the contralateral left tumour (M3) was the earliest branching but shared a substantial fraction of the truncal SNVs (Supplementary Fig. 7). This contrasts with the case of patient 10/80 where the CNA proles show that the contralateral left tumour (M1) originated from a daughter lesion shared with the liver metastases (M2 and M3) (Supplementary Fig. 9). Nonetheless, both cases conrm the clonal relatedness of the contralateral tumour with the initially diagnosed breast cancer. Together with recent reports22,23, this calls into question the current practice of considering metachronous contralateral tumours as second primary cancers. Since treatment strategies offered to patients
4 NATURE COMMUNICATIONS | 8:14944 | DOI: 10.1038/ncomms14944 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14944 ARTICLE
a
CDC42BPA
MYD88
CACNA1D
RTP1
MOG
KLHL32
KLHL32
FBXL4
INTS1
KIAA1967
PREX2
FRMPD2
HKDC1
LPXN
BCO2
EFCAB4B
A2ML1
SYT10
LRRK2
SOX21
SIX4
ESR2
SEMA7A
CHD3
PLXDC1
CD300LF
SPO11
MIAT
ELFN2
KCND3
HMCN1
FAM135B
PCCA
TIGD7
PARD6G
TCEB3
OTUD7B
RPRD2
RHBG
OR10K1
KCNJ9
MEIS1
NEB
BAZ2B
ALS2
MCM2
PCDHGA7
FRK
HYMAI
WDR72
RNF111
MGC12916
PIGS
IGF2BP1
DSG2
CACNA1A
ZNF552
DSCAM
WNK3
EIF2C1
POGZ
OR2W5
MST1
DERL2
SLC25A13
M1
M2
Ancestral state
M3
Metastatic precursor
P
Common ancestor
Normal
1 2 3 4 5
3*
b c
8q amp (MYC)
Whole-genome duplication
P (primary at surgery)
M3 (lung)
Metastatic precursor
P (primary at surgery)
M1 (ovarium)
M2 (spleen)
M2 (spleen)
M1 (ovarium)
M3 (lung)
Normal breast
1
Common ancestor
Primary BC ER-/PgR-/HER2-
4
d
e
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
M1
M2
M3
0.4
0.4
0.4
#2 0.4
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
P
P
P
#1
1.0
1.0
1.0
i: Clonal and common to #1 and #2 ii: Clonal and private to #2iii: Clonal and private to #1iv: Subclonal and private to #1v: Subclonal and private to #2vi: Shared, clonal in #2 and subclonal in #1, incompatible with iii vii: Shared, clonal in #1 and subclonal in #2, incompatible with ii viii: Shared jointly sublconal
0.8
0.8
0.8
0.6
0.6
0.6
M2
M3
M3
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
M1
M1
M2
Key
Truncal clonal Shared/private clonal
Subclonal Absent
Figure 3 | Phylogenetic reconstruction of breast cancer progression in patient 7/67. (a) Ancestral state reconstruction of tier-3 SNVs with the anatomic location of the proled lesions is depicted in b. (c) Combined phylogenetic tree obtained from CNAs and SNVs and (d) pairwise comparisons of clonal frequencies of tier-4 SNVs. The branches of the phylogenetic tree are labelled 15 and the location of these mutations in pairwise comparisons is indicated in d. (e) Schematic representation of the pairwise comparison of two ctitious samples. Mutations in i, ii and iii are fully clonal being either common to the two samples and thus inherited from their parental lineage or private to either one. Mutations in iv and v are private and subclonal to either samples. They are expected to have occurred after the divergence of the two lineages and after mutations located in ii and iii, respectively. Mutations in vi and vii are shared between the two samples but are fully clonal in one and subclonal in the other. If the two samples share a common parental origin, these mutations are incompatible with fully clonal mutations occurring in ii and iii, respectively. A possible scenario explaining their occurrence is that vi and vii are mutually exclusive and that sample #1 seeded #2 giving rise to vi or vice versa for vii. The subclonal frequencies could then be explained by intra-tumour heterogeneity in the tumour mass. Alternatively, mutations in vi and vii could nd their origin in horizontal reseeding from a third sample.
NATURE COMMUNICATIONS | 8:14944 | DOI: 10.1038/ncomms14944 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications 5
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14944
a b
BRCA1 mt
TP53 pR175H17pq del (BRCA1wt, TP53wt)
3
M3 (ovarium)
P M3
M2
M1
P (primary at autopsy)
1
M2 (adrenal gland)
M1 (liver)
M1 (liver)
Common ancestor/ Metastatic precursor
Primary BC ER-/PgR-/HER2-
P (primary at autopsy)
M3 (ovarium)
Normal breast
c d
1.0
1.0
1.0
0.8
0.8
0.8
HERC2
PI4KA
SDK2
MEPE
M2 (adrenal gland)
0.6
0.6
0.6
M1
M2
M3
0.4
0.4
0.4
2 (19.4%)
* (3.8%)
0.2
0.2
0.2
0.0
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
P
P
P
HSPA9
CLDN5
6
5
1.0
1.0
1.0
0.8
0.8
0.8
0.6
0.6
0.6
M2
M3
M3
COL15A1
PTGS1
ZBTB22
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
M1
M1
M2
Key
Truncal clonal Shared/private clonal
Subclonal Absent
Figure 4 | Phylogenetic reconstruction of breast cancer progression in patient 2/57. (a) Anatomical representation of tumour lesions proled,(b) combined phylogenetic tree obtained from CNAs and SNVs, and (c) pairwise comparisons of clonal frequencies from tier-4 SNVs. The branches of the phylogenetic tree are labelled 16 in b and the location of these mutations in pairwise comparisons is indicated by the corresponding label in c. Mutations in segment 2 at full clonal frequencies in M3 and subclonal frequencies in M2 indicate horizontal seeding, highlighted by the red segment 2 in b. A heuristic interpretation of the different possible scenarios is given in d. Only three mutations in total were in the unexplained conguration *, one in 5, one in 6 and 12 in conguration 2. Excluding mutations in the conguration *does not inuence the topology of the phylogeny. The numbers in parentheses in d give the percentage of all tier-4 SNVs.
differ widely between early and advanced stage breast cancers, it is imperative to determine in practice whether contralateral tumours represent a metastatic deposit of the primary tumour.
Evolution of genomic alterations during cancer progression. We computed for each lesion, a normalized phylogenetic branch length, which is the ratio of the path from the common ancestor to the given lesion relative to the common trunk (Fig. 6a,b). This represents the extent of genomic alterations that accumulated since the rst metastasizing event took place irrespective of the mode of progression. With few exceptions, the pattern observed from tier-3 SNVs mirrors the one from CNAs. In patients 1/69, 2/57 and 4/71, who all died from their disease at most 1 year after initial diagnosis, the bulk of evolutionary changes occurred early in the trunk of the phylogenetic tree. At the other extreme, in patients 8/82, 9/68 and
10/80, who had a longer disease history, the SNV prole is uncoordinated with the CNA whereby the former shows that most of the evolutionary changes occurred late. Figure 6c,d shows the correlation of the average normalized phylogenetic branch lengths with overall survival. Although the number of patients is small, we observed a positive correlation for both CNAs and SNVs.
In patients 8/82 and 10/80, we observed a high mutational burden which showed evidence of increased C4T substitutions at NpCpG trinucleotides. This pattern of substitution is reminiscent of mutational signature 13 in Alexandrov et al.24 Figure 7a,b shows the pattern of substitutions observed in the trunk, that is, early, and in branches, that is, late, during the evolutionary cascade of these two patients. Thus, it is possible that, in at least these two patients, the activation of the APOBEC family of cytidine deaminases caused an accumulation of mutations which uncoupled the SNV and CNA proles.
6 NATURE COMMUNICATIONS | 8:14944 | DOI: 10.1038/ncomms14944 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14944 ARTICLE
a
P1 (primary at surgery)
M3 (liver)
P (primary at surgery)
P3 (primary at surgery)
4/71
8/82
M2 (local recurrence)
M2 (pylorus)
M1(liver)
P2 (primary at surgery)
M3 (bone)
P (primary at surgery)
M4 (skin) M1 (perito
neum)
M2 (adrenal gland)
5/87
7/67
9/68
M3 (liver)
M2 (liver)
M1(liver)
P (primary at surgery)
10/80
M1(contra-lateral breast)
M2 (spleen)
M3 (lung)
M1 (ovarium)
M3 (liver)
b
M3 (ovarium)
M2 (adrenal gland)
M1 (liver)
M5 (pleura)
M4 (mediastinal lymph node)
M1 (mediastinal soft tissue)
P (primaryat autopsy) M3 (aorta)
1/69 2/57
P (primary at autopsy)
P (primary at surgery)
Key
Truncal clonal Shared/private clonal
Subclonal
Unknown
Figure 5 | Combined phylogenies representing metastatic progression across eight patients. (a) Phylogenies of early stage patients who underwent primary surgery followed by systemic treatment and (b) phylogenetic trees obtained from advanced stage treatment nave and de novo metastatic patients. The same colour code as in previous gures is used to depict early and late events. For visual purposes, all the trees were globally rescaled such that the trunks of the trees have the same length. The scale bars at the bottom represent 10 SNVs and provide an indication of the original length of the trees. For patients 4/71 and 9/68, the primary tumour samples removed at surgery were exome sequenced and putative tier-1 somatic mutations were further validated by Sequenom MassARRAY and ultra-deep amplicon sequencing. However, the corresponding SNP arrays showed that these samples had CCFs below the set threshold of 30% for phylogenetic reconstruction. Nonetheless, for these two particular samples, tier-3 SNVs were included in the construction of the phylogenetic trees for SNVs on account that the lesions had been removed several years prior to the diagnosis of distant relapses and autopsy. Similarly, for patient 10/80, the primary tissue samples did not pass the ltering criteria of tier-3 level. The thickness of the branches leading to these nodes is therefore irrelevant. These are displayed in grey.
Patient 10/80 harboured an early amplication of the APOBEC cluster on chromosome 22 (Supplementary Data 4) while patient 8/82 harboured an early APOBEC3B D316N mutation (Supplementary Data 3).
DiscussionHerein, we applied phylogenetic techniques to infer the evolutionary history of breast cancer progression from an autopsy cohort of ten patients. In contrast to previous reports, which
compared single metastasis and primary tumour pairs only or multiple-matched metastases and primary tumours for no more than two patients, the availability of a larger number of patients with matched primary and multiple metastatic samples was critical to our study for deciphering the routes of dissemination underlying metastatic progression.
We observed two possible scenarios. The most frequent implied a single successful seeding event from the primary tumour followed by metastasis-to-metastasis cascading disseminations, whereas the second involved multiple seeding events
NATURE COMMUNICATIONS | 8:14944 | DOI: 10.1038/ncomms14944 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications 7
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14944
a b
1/69
1/69
2/57
4/71
5/87
2/57
4/71
6/91
7/67
0.5
10/80
10/80
5/87
9/68
6/91
9/68
8/82
7/67
8/82
c d
2.5
1.0
2.0
Average normalized
phylogenetic banch lengths
phylogenetic banch lengths
0.8
1.5
Average normalized
0.6
1.0
0.4
0.5
0.2
0 2 4 6 8 10
0 2 4 6 8 10
Overall survival (years)
Overall survival (years)
Figure 6 | Dynamics of genomic alterations during metastatic progression. Normalized phylogenetic distance for each sample proled. These are obtained as the ratio of the path from the common ancestor node to the given sample relative to the trunk of the tree for (a) SNVs and (b) CNAs. (c,d) Correlation of the average phylogenetic distances with overall survival for SNVs and CNAs, respectively.
from the primary tumour alongside daughter metastasis-tometastasis disseminations. This dichotomy coincides with the clinical history where, except for patient 5/87, descent from a common metastatic origin was observed in patients diagnosed with early stage breast cancer, whereas multiple seeding events from the primary tumour occurred in patients diagnosed with advanced stage disease.
The role of primary tumour resection in de novo metastatic breast cancer patients is unclear, and there is currently no consensus whether this procedure confers a survival benet. A recent open label trial did not support primary surgery in de novo metastatic patients progressing to front-line chemotherapy25 and in a subgroup analysis of a Turkish study26, patients with multiple liver and lung metastases did worse in the primary surgery group consistent with earlier reports that surgical excision of the primary tumour might enhance the growth of micrometastases2729. However, in the same trial by Soran et al.26, the authors observed an increased progression free survival for primary tumour resection in ER /HER2-de novo
metastatic patients with solitary bone metastases. Thus, our observations suggest that surgical excision of the primary tumour might reduce metastatic dissemination in selected cases hence providing a potential biological rationale for this practice. Similarly, there is no strong recommendation showing overall survival benet from surgical resection of oligo-metastases in breast cancer. From our analyses, metastatic lesions constitute an additional source of seeding and heterogeneity in advanced breast cancer. Our cohort is too small to derive practice-changing evidence, but supports the concept that resecting isolated metastases may be of clinical benet in oligo-metastatic breast cancer patients. In both cases, results from larger, prospective studies are warranted.
We reckoned that the number of late SNVs and CNAs, should increase as distant metastases evolve and should give an
indication, albeit approximate, of the time elapsed since they last diverged from their common ancestor. Indeed, we observed a positive correlation between overall survival and the average normalized phylogenetic branch lengths. This can be explained by the fact that patients 1/69 and 2/57 were de novo metastatic and consistent with those two patients, 4/71 also had a very short distant metastasis free and overall survival. At the other extreme, in patients 8/82, 9/68 and 10/80 who relapsed more than 4 years after initial diagnosis, the extent of late genomic alterations were commensurate with the survival of the patients. These results suggest, not unexpectedly, that metastases from patients with longer cancer histories are genetically more distant from their common ancestor or their primary tissue of origin than those of patients with a shorter cancer history.
Evidence has been accumulating in the literature regarding treatment-induced genomic remodelling15,16,3037, especially implicating ESR1 and PTEN alterations in endocrine and PI3K-inhibitor resistance, respectively. In our series, four out of the ve ER-positive patients received aromatase inhibitors. However, no ESR1 mutations have been detected in their distant metastases. None of the patients received any PI3K-inhibitor making it impossible to evaluate resistance mechanisms associated to this treatment.
Overall, by characterizing the genomic alterations that shape metastatic genomes, we have gleaned new insights into the dissemination patterns of breast cancer with potential clinical implications: (1) cascading dissemination from metastases appears to be a major route of metastatic progression in early, radically resected breast cancer and (2) primary tumours at diagnosis may not adequately represent advanced metastatic disease advocating the need for genetic characterization of multiple metastatic lesions. The very recent technical advances in the assessment of circulating tumour DNA38 may both allow
8 NATURE COMMUNICATIONS | 8:14944 | DOI: 10.1038/ncomms14944 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14944 ARTICLE
a
C > A C > G C > T T > A T > C T > G
10
0
Early mutations
Late mutations
Frequency (%)
8
6
4
2
0
12
Frequency (%)
10
8
6
4
2
b
C > A C > G C > T T > A T > C T > G
10
0
Early mutations
Late mutations
Frequency (%)
8
6
4
2
0
12
Frequency (%)
10
8
6
4
2
Figure 7 | Distribution of substitutions during metastatic progression. Frequency of the different types of substitutions for tier-3 SNVs in patients (a) 8/82 and (b) 10/80. These are grouped as early and late according to their occurrence in the respective phylogenetic trees.
the early detection of micrometastatic disease before recurrence and may better capture tumour heterogeneity of metastatic disease guiding the best therapeutic options for early and advanced breast cancer patients in the future. Genomic alterations uniquely dening breast cancer metastases from an aetiological standpoint, and therapeutic agents tackling them are yet to be found.
Methods
Patients and samples. The average period between the time of death and acquisition of autopsy samples was 2.8 days (1.54.2). Between the time of death and dissection, the cadavers were kept at 4 C. At autopsy, complete external and detailed internal examinations were performed. The organs were thoroughly examined, weighed and tissue samples were taken. All tissue samples were xed in formalin and embedded in parafn as part of routine workup. All bone samples were decalcied using EDTA solution to preserve the antigenicity of the tissue proteins. Hematoxylin-eosin (HE) stained sections were reviewed by J.K., A.M.S. and B.S. to conrm the presence and percentage of invasive carcinoma as wellas other tissue composition. A detailed description of the clinico-pathological characteristics of each patient is provided in Supplementary Data 1. This project was approved by the Institutional Review Board (IKEB 185-1/2007). This study is retrospective in nature and part of a larger institutional-based autopsy programme carried out at the Semmelweis University. It did not impact treatment decision for the patients involved and received approval from the ethical committee of the Semmelweis University.
Pathological characterization of samples. The stage of primary tumourswas reclassied based on the 7th version of TNM classication system. Immunohistochemistry (IHC) and uorescence in situ hybridization (FISH) were performed on 4 mm tissue sections. All the samples underwent centralized IHC for the oestrogen (ER) and progesterone (PgR) receptor status and IHC/FISH characterization for HER2 receptor status. Hormone receptor and HER2 status
were assessed by IHC on all samples with an automated immunostainer system (Ventana Benchmark, Roche Diagnostics, Mannheim, Germany) according to the manufacturers instructions with the following antibodies: ERSP1 rabbit monoclonal (Ventana #790-4324) ready to use kit, PgR1E2 rabbit monoclonal (Ventana #790-2223) ready to use kit, Ki-67MIB1 (DAKO #M7240, Carpinteria, CA, USA) at dilution 1:100 and HER2/neu4B5 (Ventana #800-2996) ready to use kit. Hormone receptor status was evaluated using Allred-Quick scoring system by two investigators independently (A.M.S. and B.S.). Ki-67 index was measuredas the ratio of the positive tumour cell nuclei in the sample. IHC assessment of Ki-67 was evaluated only on the primary tumours. HER2 positivity was primarily dened at protein level using IHC and supplemented by FISH using Poseidon probes (Kreatech Diagnostics, #KBI-10735, Amsterdam, Netherlands). HER2 IHC was evaluated according to the modied standard protocol that is, positive by IHC only if more than 30% of tumour cells show strong, complete membrane reaction. FISH was performed on samples with IHC 2 and 3 for the
evaluation of HER2 gene amplication status. FISH results were evaluated according to the 2013 ASCO/CAP guidelines39. A detailed review of all samples is provided in Supplementary Data 2.
DNA isolation. DNA was extracted from the primary tumours, metastases and matched normal tissue from FFPE tissue blocks after macrodissection usingthe QIAamp DNA FFPE Tissue Kit (Qiagen, #56404, Hilden, Germany). Double-stranded DNA (dsDNA) was quantied using the QUBIT 2.0 Fluorometer (Invitrogen) and the PicoGreen assay for double-stranded DNA. Only samples from which 41 mg of double-stranded DNA, as quantied by the two assays could be extracted, were selected for downstream molecular proling.
Exome and targeted sequencing analysis. For each patient, part of the available samples was used to index the presence of SNVs using whole-exome sequencing. A total of 51 samples including at least one normal sample per patient were sequenced at a target coverage of 40 . The putative somatic SNVs were validated
by Sequenom MassARRAY in both the germline reference and cancer samples. As further validation, all available cancer samples were subjected to targeted
NATURE COMMUNICATIONS | 8:14944 | DOI: 10.1038/ncomms14944 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications 9
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14944
amplicon deep sequencing at a median coverage of 9,000 to conrm initial
sequencing results and increase the accuracy of VAFs.
Whole-exome DNA libraries were generated following the manufacturers protocol with minor modications (Illumina TrusSeq DNA library preparation kit v2). Before end-repair, a 65 C incubation step was added to remove reversible crosslinks, after which excessive single-stranded DNA was removed enzymatically. The concentration of double-stranded DNA was assessed using the PicoGreen assay and the concentration of adapters used for ligation was adjusted accordingly. For the library enrichment, 57 cycles of PCR were used. Whole-exome capture was then performed using the Illumina Human Exome capture kit and libraries were sequenced on a HiSeq2000 using V3 owcells generating 2 100 bp
paired-end reads. Raw sequencing reads were mapped to the human reference genome (NCBI37/hg19) using the Burrows-Wheeler Aligner (BWA)40 and aligned reads were processed with SAMtools41. Duplicate reads were removed using Picard tools. Base recalibration, local realignment around indels and SNV calling were performed using the GenomeAnalysisToolKit (GATK)42. Indels were called using Dindel43. Mutations were ltered based on mapping quality, and sequencing coverage. Somatic mutations were further ltered by comparison with the matched germline reference and using common variant databases such as the 1000 genomes project44 and dbSNP version 132 (ref. 45).
For the targeted resequencing experiments, primers were designed using the Sequenoms MassARRAY assay design software and universal sequence tags were added manually. Amplicons encompassing the mutation of interest were generated using a rst PCR (Roche FastStart High Fidelity PCR kit) with universal sequencing adapters (Access Array Barcode) containing a 10 bp index added in a second PCR. The resulting PCR products were pooled, denatured and sequenced on an Illumina MiSeq in a 2 75 bp paired-end sequencing run using a V3 owcell. Fastq les
were generated and demultiplexed using CASAVA. The raw sequencing reads were mapped to the human reference genome (NCBI37/hg19) using BWA. Read counts of both variant and reference alleles were called using GATK and manually checked in IGV46.
Tiering system for ltering SNVs. A ve-tier system was devised to lter SNVs for downstream analysis (Supplementary Data 3). We considered tier-0 SNVs as any SNV indexed in any sample including the matched germline reference by exome sequencing. Tier-1 SNVs are those that were further called somatic after comparison with the germline reference while tier-2 SNVs are the subset of tier-1 which have been further validated and conrmed somatic by Sequenom MassARRAY. Tier-3 SNVs are the further subset that was conrmed present by targeted deep amplicon sequencing. Due to formalin xation and parafn embedding, at some loci, all four bases were often observed at low frequencies. Thus, to determine the background noise of the targeted deep sequencing experiments, we selected 25 bp upstream and downstream of tier-2 SNVs. For each position, the number of reads for the reference nucleotide and the other3 non-reference nucleotides was determined using GATK. To exclude possible SNPs or SNVs other than the one of interest, we removed all data points that showed more than 10% non-reference reads. We then pooled all the data for each mutation type (AT4CG, AT4GC, AT4TA, CG4AT, CG4GC and CG4TA)
and calculated the average background signal and s.d. From these estimates, the highest value observed was 1.59%. Thus, we chose a conservative value of 3% as the nal cut-off for calling a mutation present.
In the present context, several samples were matched to a given patient and we found it equally important to have a high coverage to call a mutation present in one sample, as it is to call absence in another paired sample. Thus, at the tier-3 level, we rst ltered all SNVs in all matched samples to have a coverage of 41,500 and
kept samples with a minimum of 75% non-missing values. We then excluded all SNVs with 41 missing value across any of the matched samples previously retained. We reversed the ltering order for VAFs by rst eliminating SNVs that were called absent o3% across all matched samples previously retained. Because of low CCF, it is likely that some samples have a lower total number of SNVs called present. However, a lower number of detected mutations could also be biologically grounded. Therefore, we use a lenient ltering to exclude samples with potentially low CCFs by requiring that 420% of SNVs be called present. Some samples showed a large number of SNVs or had limited starting DNA available. For those samples, we were compelled to skip validation by Sequenom MassARRAY and thus the tier-2 level. Finally, we required that for all samples where a matched-SNP array and targeted sequencing were available, that the sample displaysa CCF 430% as determined by SNP array. All SNVs from that sample were then referred to as tier-4 SNVs
SNP array analysis. For the estimation of CNAs, a total of 64 samples were shipped for processing to the Affymetrix Research Services Laboratory. Normal and tumour DNA from FFPE samples were genotyped using the Affymetrix OncoScan FFPE Express 2.0 arrays. Routine formalin xation and parafn embedding frequently damages DNA causing wavy proles, which may be wrongly interpreted as copy number aberrations. Thus, the median absolute pairwise deviation (MAPD) and the median auto-correlation (MAC) across the log2 ratio intensities were used as quality control for the SNP arrays (Supplementary Data 2). Samples with an MAPD40.7 or an MAC40.3 were discarded. Samples suspected to have a low CCF on visual inspection of the BAF tracks were agged and after
processing, those conrmed to have less than 30% CCF were discarded from downstream applications. From the remaining samples, only informative probes displaying heterozygous genotype (AB) and copy-neutral state (2) in the matched normal sample were kept for analysis. We used the added value of multiple-matched samples per patient to infer breakpoints, which may otherwise have been missed due to differences in CCF across the various samples. Thelog2 ratio intensities and BAF, grouped per patient, were segmented jointly using the multitrack PCF algorithm in the R package copynumber47 to determine common breakpoints. The penalty parameter g determining discontinuities in the log2 ratio and BAF tracks was set individually for each patient after visual inspection of the segmentation proles. Finally, all segments that were less than three s.d. away were merged with their immediate neighbours.
Integer level estimates of total copy number and major allele were obtained using GAP48. We compared the estimates returned by GAP with two other mainstream programs: (1) ASCAT49 and (2) ABSOLUTE21. Unless otherwise stated, the parameter sets for each programme were kept at default values. For ASCAT, the parameter g, which determines the platform-specic compression ratio, was set to 0.8. The programme returns one default estimate of ploidy and CCF, which was used in the comparison against GAP. ABSOLUTE was run in total copy mode and model based evaluation against the SNP6 platform. The fraction of the genome allowed to be non-clonal was set to innity so that the maximum number of solutions could be evaluated. ABSOLUTE returns a set of possible values for ploidy and CCF. The closest solution to that returned by GAP in Euclidean space, after rescaling ploidy values to unit distance, was chosen. The results obtained by GAP, ASCAT and ABSOLUTE are contrasted in Supplementary Fig. 10a,b. In all three cases, the Spearmans r between the estimated CCF was high. In the case of ploidy estimates, the correlation between GAP and ASCAT was relatively low due to two reasons: (1) several matched samples from two patients displayed ploidy values outside the range considered by ASCAT and (2) at several loci displaying high copy numbers, GAP truncates the estimate to 8 while ASCAT does not thereby affecting the true estimate. We computed the Cohens k coefcient between ASCAT and GAP to measure the agreement in total copy numbers and major alleles. These are displayed in Supplementary Fig. 10c while the correlation of the k coefcients for total copy numbers and the absolute difference in ploidy between the two algorithms is shown in panel d. The results showed that the disagreement between the two programs in the phasing of alleles and estimation of total copy numbers is in fact linked to the discordances in ploidy (r 0.898, Po0.01). Because the present dataset
contained matched samples and may represent a biased result, we used an additional dataset of 125 unrelated samples proled using similar Affymetrix OncoScan SNP arrays (unpublished data) to reproduce the analysis. Supplementary Figure 11a,b shows a very good correlation between the two estimates of ploidy (r 0.885, Po0.01) and CCF (r 0.907, Po0.01). However, the correlation of the
k coefcients for total copy numbers and the absolute difference in ploidy(r 0.733, Po0.01) or CCF (r 0.119, P 0.147) shows that it is, in fact,
the incorrect estimation of global genomic mass that leads to disagreement between the two algorithms. Supplementary Figure 11fm contrasts the results fromGAP and ASCAT for four samples with increasing genomic mass and illustrates the complexity of choosing the correct solution of CCF and ploidy. Given the present context of matched samples and because GAP allowed for manual review of ploidy solutions, we opted for this package for downstream analyses taking into consideration the maximal limit of eight copies imposed by the software as follows:(1) SNVs that occurred at loci where the total copy number was 8 or a major allele count 44 was observed were not considered for the estimation of CCF or clonal frequency and (2) similarly for the phylogenetic reconstructions using CNAs, except in the case of high ploidy tumours (that is, 1/69, 7/67, 8/82 and 10/80) any locus displaying a total copy number of 8 or a major allele count 44 in any sample was removed from all matching samples of that particular patient.
Estimation of CCF and clonal frequencies from tier-4 SNVs. The CCF was estimated both individually from each tier-4 SNV and globally from the whole set of tier-4 SNVs in samples where a matched-SNP array was available. Let qt denote the total copy number at the mutated locus, q1 denote the minor copy number and q2 denote the major copy number such that q2Zq1, qt q1 q2 and q1, q2 and
qtAa . Let sq denote the number of mutated copies such that sqA{1, y, q2}.
Let fsq denotes the expected VAF of the SNV where fsqA[0,1]. Then, fsq is related to sq and a, the CCF, as follows:
fsq sq
aaqt 21 a
1
where a is the variable that we are trying to estimate while ^f is taken to be the observed VAF. We denote the estimate of CCF from sequencing as ^a. The above equation can be rearranged such that
^a
2^f sq
^fqt 2
2
Let n be the total number of sequencing reads that cover the mutated locus. Then
Pr X n
^f
osqBeta fsq n^f 1; n1
^f 1
3
10 NATURE COMMUNICATIONS | 8:14944 | DOI: 10.1038/ncomms14944 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14944 ARTICLE
where X is the number of mutated reads for a given SNV, wsq specify the mixture weights for each possible value of sq. Computation of fsq requires prior knowledge of a. To estimate sq and for individual SNVs irrespective of other mutations, we make the assumption that a is known. Thus, qt, q1, q2 and a are plugin quantities obtained from the corresponding SNP array. Then
s q argmax
s 2f1; ... ;q g
PrX n
n o
4
^f
and the sequencing error eA[0,1) is modelled after Purdom et al.50 such that
fe
13 2e
3 4efsq e
5
replaces fsq in the Beta distribution. We use uniform priors over the range of possible values of sq and e 0.01. These form the basis of the pointwise estimates
of CCFs (Supplementary Fig. 10b). To relax the requirement on prior knowledge of a, we dene the likelihood function over all tier-4 SNVs present in a given sample as
L f n; sq; osq
X
s 2f1; ... ;q g
Pr X n
^f
6
At a given value of a, we compute the log of L and iteratively adjust the weights wsq until L converges or a maximum of 100 iterations is reached. The global CCF, a , is the value that maximizes L such that
a argmax
a2f0:1; ... ;0:9g L
^f a
j
n o
7
An example is shown in Supplementary Fig. 10ef and a is compared to the estimate of GAP in Supplementary Fig. 10g. The CCF and the clonal frequency of SNVs are related and the latter was computed jointly for all samples belonging to a given patient using PyClone51 from the estimates of major and minor copy numbers returned by GAP. We used a Beta Binomial distribution with parental copy number option and default parameter settings except for the sequencing error which was set to 0.01 and the tumour content which was set to the global CCF, a , estimated above.
Phylogenetic analysis of SNVs. The raw VAFs of tier-3 SNVs from targeted resequencing were converted into binary calls based on a threshold of 3%. We initially intended to lter in only fully clonal tier-3 SNVs using this conservative cut-off and infer the phylogeny for individual patients using the Dollo parsimony method and a branch and bound exhaustive search for the best phylogenetic reconstruction as described in Felsenstein52 using the programme PHYLIP. The outgroup used for rooting the phylogenies was one where all the characters were set to the ancestral state 0. The Dollo parsimony criterion minimizes homoplasies at the expense of reversions in later branches and the criteria for determining the best phylogenetic tree is minimizing the number of such reversions. Despite this, several phylogenetic trees can be equally parsimonious. Instead of collapsing the trees using consensus methods, we used the corresponding CNA based tree to break ties and infer the correct phylogeny. The trees in Newick format were rendered using the R package ape53. The heat maps representing the tier-3 SNVs were ordered according to the topology and branch lengths of their corresponding phylogenetic trees via an ancestral state reconstruction using the accelerated transition model54,55 as provided in the R package phangorn56. The phylogenetic trees for each patient are shown in Supplementary Fig. 7. Each predicted reversion of SNV was manually veried against the underlying CNA prole. The tier-3 level does not take into account the CCF of the samples. It is possible that, despite the previous lters, very low CCF samples with overall fewer positive mutation calls are included. These samples would lead to early branches in the trunk of the phylogenies. Thus, we reproduced the same analysis with samples having tier-4 SNVs. The results are shown in Supplementary Fig. 8.
Phylogenetic analysis of CNAs. The major and minor copy numbers returned by GAP were modelled using a transducer-based pairwise comparison function using the programme MEDICC57. For near-diploid samples, we assume a pure diploid outgroup with no copy number aberrations that is, 2/1 (total copy number/major allele) to root the phylogenies. In the case of tetraploid samples, we included an additional step to phase CNAs relative to the whole-genome duplication event within the phylogeny of the given patient. We rst used the classic approach to infer an intermediate tree with correct topology irrespective of branch lengths. For the major or minor copy and at each locus, we compute a parsimony score, which is the sum of branch lengths of the intermediate tree rooted using any of the four tetraploid ancestral states 6/4, 4/2, 4/4 and 2/2 (total copy number/major allele). These represent the copy number states 3/2, 2/1, 2/2 and 1/1 followinga whole-genome duplication event. We used all possible permutations of observed copy numbers at the internal nodes except for 0-1 transitions. We chose the intermediate tree and thus the related tetraploid ancestral state obtaining the minimum score. Ties, if present, are broken by summing the intermediatetree length with the CNAs occurring prior to the whole-genome duplication that is, 1, 0, 2 and 1, respectively. Finally, the global phylogenetic tree is inferred using the classic approach jointly at all loci and rooted using the tetraploid ancestor as
outgroup. The phylogenetic trees are shown in Figs 35 of the main text and Supplementary Fig. 9. Support values for the phylogenetic trees were obtained by resampling the pairwise distance matrix 100 times with added Gaussian noise and counting similar bipartitions between the resulting trees and the original phylogeny.
Data availability. The sequencing and SNP array data have been deposited at the European Genome-Phenome Archive (http://www.ebi.ac.uk/ega/
Web End =http://www.ebi.ac.uk/ega/), which is hosted by the European Bioinformatics Institute, under accession number EGAS00001000760.
References
1. Campbell, P. J. et al. The patterns and dynamics of genomic instability in metastatic pancreatic cancer. Nature 467, 11091113 (2010).
2. Yachida, S. et al. Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature 467, 11141117 (2010).
3. Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883892 (2012).
4. Bashashati, A. et al. Distinct evolutionary trajectories of primary high-grade serous ovarian cancers revealed through spatial mutational proling. J. Pathol. 231, 2134 (2013).
5. Schwarz, R. F. et al. Spatial and temporal heterogeneity in high-grade serous ovarian cancer: a phylogenetic analysis. PLoS Med. 12, e1001789 (2015).
6. McPherson, A. et al. Divergent modes of clonal spread and intraperitoneal mixing in high-grade serous ovarian cancer. Nat. Genet. 48, 758767 (2016).
7. Hong, M. K. et al. Tracking the origins and drivers of subclonal metastatic expansion in prostate cancer. Nat. Commun. 6, 6605 (2015).
8. Gundem, G. et al. The evolutionary history of lethal metastatic prostate cancer. Nature 520, 353357 (2015).
9. de Bruin, E. C. et al. Spatial and temporal diversity in genomic instability processes denes lung cancer evolution. Science 346, 251256 (2014).
10. Gerlinger, M. et al. Genomic architecture and evolution of clear cell renal cell carcinomas dened by multiregion sequencing. Nat. Genet. 46, 225233 (2014).
11. Shah, S. P. et al. Mutational evolution in a lobular breast tumour proled at single nucleotide resolution. Nature 461, 809813 (2009).
12. Ding, L. et al. Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature 464, 9991005 (2010).
13. Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 9094 (2011).
14. Cummings, M. C. et al. Metastatic progression of breast cancer: insights from 50 years of autopsies. J. Pathol. 232, 2331 (2014).
15. Juric, D. et al. Convergent loss of PTEN leads to clinical resistance to a PI(3)Ka inhibitor. Nature 518, 240244 (2015).
16. Murtaza, M. et al. Multifocal clonal evolution characterized using circulating tumour DNA in a case of metastatic breast cancer. Nat. Commun. 6, 8760 (2015).
17. Brastianos, P. K. et al. Genomic characterization of brain metastases reveals branched evolution and potential therapeutic targets. Cancer Discov. 5, 11641177 (2015).
18. Zhao, Z. M. et al. Early and multiple origins of metastatic lineages within primary tumors. Proc. Natl Acad. Sci. USA 113, 21402145 (2016).
19. Yates, L. R. et al. Subclonal diversication of primary breast cancer revealed by multiregion sequencing. Nat. Med. 21, 751759 (2015).
20. Kim, M. et al. Tumor self-seeding by circulating cancer cells. Cell 139, 13151326 (2009).
21. Carter, S. L. et al. Absolute quantication of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413421 (2012).
22. Alkner, S. et al. Contralateral breast cancer can represent a metastatic spread of the rst primary tumor: determination of clonal relationship between contralateral breast cancers using next-generation whole genome sequencing. Breast Cancer Res. 17, 102 (2015).
23. Klevebring, D. et al. Exome sequencing of contralateral breast cancer identies metastatic disease. Breast Cancer Res. Treat. 151, 319324 (2015).
24. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415421 (2013).
25. Badwe, R. et al. Locoregional treatment versus no treatment of the primary tumour in metastatic breast cancer: an open-label randomised controlled trial. Lancet Oncol. 16, 13801388 (2015).
26. Soran, A. et al. A randomized controlled trial evaluating resection of the primary breast tumor in women presenting with de novo stage IV breast cancer: turkish study (Protocol MF07-01). J. Clin. Oncol. 34, abstr 1005 (2016).
27. Babiera, G. V. et al. Effect of primary tumor extirpation in breast cancer patients who present with stage IV disease and an intact primary tumor. Ann. Surg. Oncol. 13, 776782 (2006).
28. Fisher, B., Gunduz, N., Coyle, J., Rudock, C. & Saffer, E. Presence of a growth-stimulating factor in serum following primary tumor removal in mice. Cancer Res. 49, 19962001 (1989).
NATURE COMMUNICATIONS | 8:14944 | DOI: 10.1038/ncomms14944 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications 11
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14944
29. Neuman, H. B. et al. Stage IV breast cancer in the era of targeted therapy: does surgery of the primary tumor matter? Cancer 116, 12261233 (2010).
30. Robinson, D. R. et al. Activating ESR1 mutations in hormone-resistant metastatic breast cancer. Nat. Genet. 45, 14461451 (2013).
31. Ellis, M. J. et al. Whole-genome analysis informs breast cancer response to aromatase inhibition. Nature 486, 353360 (2012).
32. Schiavon, G. et al. Analysis of ESR1 mutation in circulating tumorDNA demonstrates evolution during therapy for metastatic breast cancer. Sci. Transl. Med. 7, 313ra182 (2015).
33. Li, S. et al. Endocrine-therapy-resistant ESR1 variants revealed by genomic characterization of breast-cancer-derived xenografts. Cell Rep. 4, 11161130 (2013).
34. Fribbens, C. et al. Plasma ESR1 mutations and the treatment of estrogen receptor-positive advanced breast cancer. J. Clin. Oncol. 34, 29612968 (2016).
35. Merenbakh-Lamin, K. et al. D538G mutation in estrogen receptor-alpha:a novel mechanism for acquired endocrine resistance in breast cancer. Cancer Res. 73, 68566864 (2013).
36. Spoerke, J. M. et al. Heterogeneity and clinical signicance of ESR1 mutations in ER-positive metastatic breast cancer patients receiving fulvestrant. Nat. Commun. 7, 11579 (2016).
37. Toy, W. et al. ESR1 ligand-binding domain mutations in hormone-resistant breast cancer. Nat. Genet. 45, 14391445 (2013).
38. Garcia-Murillas, I. et al. Mutation tracking in circulating tumor DNA predicts relapse in early breast cancer. Sci. Transl. Med. 7, 302ra133 (2015).
39. Wolff, A. C. et al. Recommendations for human epidermal growth factor receptor 2 testing in breast cancer: American Society of Clinical Oncology/ College of American Pathologists clinical practice guideline update. J. Clin. Oncol. 31, 39974013 (2007).
40. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows Wheeler transform. Bioinformatics 25, 17541760 (2009).
41. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 20782079 (2009).
42. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 12971303 (2010).
43. Albers, C. A. et al. Dindel: accurate indel calls from short-read data. Genome Res. 21, 961973 (2011).
44. Abecasis, G. R. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 5665 (2012).
45. Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308311 (2001).
46. Thorvaldsdottir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 14, 178192 (2013).
47. Nilsen, G. et al. Copynumber: efcient algorithms for single- and multi-track copy number segmentation. BMC Genomics 13, 591 (2012).
48. Popova, T. et al. Genome Alteration Print (GAP): a tool to visualize and mine complex cancer genomic proles obtained by SNP arrays. Genome Biol. 10, R128 (2009).
49. Van Loo, P. et al. Allele-specic copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 1691016915 (2010).
50. Purdom, E. et al. Methods and challenges in timing chromosomal abnormalities within cancer samples. Bioinformatics 29, 31133120 (2013).
51. Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nat. Methods 11, 396398 (2014).
52. Felsenstein. Inferring Phylogenies (Sinauer Associates Inc., 2004).53. Paradis, E., Claude, J. & Strimmer, K. APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics 20, 289290 (2004).
54. Swofford, D. L. & Maddison, W. P. Reconstructing ancestral character states under Wagner parsimony. Math. Biosci. 87, 199229 (1987).
55. Maddison, W. P. & Maddison, D. R. MacClade: Analysis of Phylogeny and Character Evolution (Sinauer, 1992).
56. Schliep, K. P. phangorn: phylogenetic analysis in R. Bioinformatics 27, 592593 (2011).
57. Schwarz, R. F. et al. Phylogenetic quantication of intra-tumour heterogeneity. PLoS Comput. Biol. 10, e1003535 (2014).
Acknowledgements
We would like to extend our gratitude to the families of the deceased patients who participated in this study. The authors would further like to thank N. Kheddoumi, E. Azumah, M. Pekr and E. Samodai for technical assistance, O. Kiss, I. Illys, L. Madaras,A. Kovcs, G. Lotz, Z. Baranyai, B. Jray, T. Glasz, T. Herbert, K. Simon, C. Diczhzi,I. Kaszs, G. Lukcs Tth, T. Barna, F. Salamon and G. Bodoky for collection of study material, Z. Schaff, J. Tmr and their colleagues at the 2nd Department of Pathology, Semmelweis University for supporting this research programme. D.B. and C.S. are supported by the Belgian Fonds National de la Recherche Scientique (F.R.S-FNRS). C.D. is supported by a grant from the Brussels RegionImpulse Programme Life Sciences and by Les Amis de lInstitut Bordet. B.S. is supported by the Susan G. Komen and American Joint (JDC) and A.M.S. by the European Union and the State of Hungary, co-nanced by the European Social Fund in the framework of TMOP 4.2.4.A/2-11-1-2012-0001 National Excellence Program. This study was supported by grants fromthe MEDIC Foundation, Les Amis de lInstitut Bordet, MKOT-Roche-2012, TMOP4.2.2/B-10/1-2010-0013 and TMOP 4.2.1.B-09/1/KMR-2010-0001.
Author contributions
C.D. and C.S. conceived the study. B.S., A.M.S., Z.I.N., Z.F., A.-M.T, M.D., G.S. and J.K. provided the clinical specimens. B.S., A.M.S., Z.I.N., Z.F., A.-M.T., D.L. and J.K. performed the histopathological assessment of the samples. P.-Y.A. and G.R. processed the samples. D.S. aligned the sequencing reads and called the mutations. D.B. analysed the SNP arrays, reconstructed the phylogenies and rendered the artwork. D.B., D.S. and B.S. reviewed and assembled all the results. D.B., D.S., B.S., G.Z., L.P., C.S., M.P., D.L., J.K., D.L. and C.D. interpreted the results. D.B., D.L., C.S. and C.D. wrote the manuscript. All the authors read and approved the nal manuscript.
Additional information
Supplementary Information accompanies this paper at http://www.nature.com/naturecommunications
Web End =http://www.nature.com/ http://www.nature.com/naturecommunications
Web End =naturecommunications
Competing interests: The authors declare no competing nancial interests.
Reprints and permission information is available online at http://npg.nature.com/reprintsandpermissions/
Web End =http://npg.nature.com/ http://npg.nature.com/reprintsandpermissions/
Web End =reprintsandpermissions/
How to cite this article: Brown, D. et al. Phylogenetic analysis of metastatic progression in breast cancer using somatic mutations and copy number aberrations. Nat. Commun. 8, 14944 doi: 10.1038/ncomms14944 (2017).
Publishers note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afliations.
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the articles Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
Web End =http://creativecommons.org/licenses/by/4.0/
r The Author(s) 2017
12 NATURE COMMUNICATIONS | 8:14944 | DOI: 10.1038/ncomms14944 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright Nature Publishing Group Apr 2017
Abstract
Several studies using genome-wide molecular techniques have reported various degrees of genetic heterogeneity between primary tumours and their distant metastases. However, it has been difficult to discern patterns of dissemination owing to the limited number of patients and available metastases. Here, we use phylogenetic techniques on data generated using whole-exome sequencing and copy number profiling of primary and multiple-matched metastatic tumours from ten autopsied patients to infer the evolutionary history of breast cancer progression. We observed two modes of disease progression. In some patients, all distant metastases cluster on a branch separate from their primary lesion. Clonal frequency analyses of somatic mutations show that the metastases have a monoclonal origin and descend from a common 'metastatic precursor'. Alternatively, multiple metastatic lesions are seeded from different clones present within the primary tumour. We further show that a metastasis can be horizontally cross-seeded. These findings provide insights into breast cancer dissemination.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer