ARTICLE
Received 21 Apr 2015 | Accepted 22 Sep 2015 | Published 30 Oct 2015
DOI: 10.1038/ncomms9699 OPEN
A CpG-methylation-based assay to predict survival in clear cell renal cell carcinoma
Jin-Huan Wei1,*, Ahmed Haddad2,*, Kai-Jie Wu3,*, Hong-Wei Zhao4,*, Payal Kapur5, Zhi-Ling Zhang6, Liang-Yun Zhao7, Zhen-Hua Chen1, Yun-Yun Zhou8, Jian-Cheng Zhou2, Bin Wang2, Yan-Hong Yu7, Mu-Yan Cai9, Dan Xie9, Bing Liao10, Cai-Xia Li11, Pei-Xing Li11, Zong-Ren Wang1, Fang-Jian Zhou6, Lei Shi4, Qing-Zuo Liu4, Zhen-Li Gao4, Da-Lin He3, Wei Chen1, Jer-Tsong Hsieh2, Quan-Zhen Li12, Vitaly Margulis2 & Jun-Hang Luo1
Clear cell renal cell carcinomas (ccRCCs) display divergent clinical behaviours. Molecular markers might improve risk stratication of ccRCC. Here we use, based on genome-wide CpG methylation proling, a LASSO model to develop a ve-CpG-based assay for ccRCC prognosis that can be used with formalin-xed parafn-embedded specimens. The ve-CpG-based classier was validated in three independent sets from China, United States and the Cancer Genome Atlas data set. The classier predicts the overall survival of ccRCC patients (hazard ratio 2.96 4.82; P 3.9 10 6 2.2 10 9), independent of standard clinical prog
nostic factors. The ve-CpG-based classier successfully categorizes patients into high-risk and low-risk groups, with signicant differences of clinical outcome in respective clinical stages and individual stage, size, grade and necrosis scores. Moreover, methylation at the ve CpGs correlates with expression of ve genes: PITX1, FOXE3, TWF2, EHBP1L1 and RIN1. Our ve-CpG-based classier is a practical and reliable prognostic tool for ccRCC that can add prognostic value to the staging system.
1 Department of Urology, First Afliated Hospital, Sun Yat-sen University, No. 58, ZhongShan Second Road, Guangdong 510080, China. 2 Department of Urology, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390, USA. 3 Department of Urology, First Afliated Hospital of Xian Jiaotong University, Shaanxi 710061, China. 4 Department of Urology, Afliated Yantai Yuhuangding Hospital, Qingdao University Medical College, Shandong 264000, China. 5 Department of Pathology, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390, USA. 6 Department of Urology, Cancer Center, Sun Yat-sen University, Guangdong 510060, China. 7 Department of Urology, Afliated Hospital of Kunming University of Science and Technology, Yunnan 650032, China. 8 Quantitive Biomedical Research Center, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390, USA. 9 Department of Pathology, Cancer Center, Sun Yat-sen University, Guangdong 510060, China. 10 Department of Pathology, First Afliated Hospital, Sun Yat-sen University, Guangdong 510080, China. 11 School of Mathematics and Computational Science, Sun Yat-sen University, Guangdong 510275, China. 12 Department of Immunology and Microarray Core, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390, USA. * These authors contributed equally to this work. Correspondence and requests for materials should be addressed to J.H.L. (email: mailto:[email protected]
Web End [email protected] ).
NATURE COMMUNICATIONS | 6:8699 | DOI: 10.1038/ncomms9699 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications 1
& 2015 Macmillan Publishers Limited. All rights reserved.
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms9699
Renal cell carcinoma (RCC) is the most common malignant neoplasm arising from the kidney and it represents B23% of all human malignancies. The major histological subtype
is clear cell RCC (ccRCC), accounting for 8090% of all RCC cases1. TNM stage and Fuhrman grade remain the most commonly used predictors of clinical outcome for patients with ccRCC. Clinically integrated systems, such as the Mayo Clinic stage, size, grade and necrosis (SSIGN) score and the University of California Integrated Staging System, can improve prognostic accuracy2,3. However, patients with similar clinical features or integrated systems score may have diverse outcomes. Thus, there is a need to add prognostic value to the current staging system, which could be achieved with the use of validated biomarkers. Nevertheless, despite numerous studies, no reliable prognostic biomarkers for ccRCC have been identied or used routinely in clinical practice to date.
As DNA methylation is a crucial factor for cancer formation, it rapidly gained clinical attention as a biomarker for diagnosis and prognosis46. DNA methylation almost exclusively occurs at the C-5 position of cytosines in the sequence context of 50-CpG-30 in mammalian cells. As genome-wide technologies continue to develop, such as the development of the Innium HumanMethylation27 array and HumanMethylation450 array, the understanding of CpG methylation associated with human cancers including RCC continues to rapidly improve712.
Here we develop and validate a practical and reliable classier based on genome-wide CpG methylation proling that improves risk stratication for patients with ccRCC. Moreover, we use the Cancer Genome Atlas (TCGA) data set to validate our prognostic classier, investigate the relationship between CpG methylation and gene expression, and analyse the gene interaction network.
ResultsIdentifying candidate CpGs based on genome-wide proling. We analysed 46 paired ccRCC and adjacent normal tissues by CpG methylation microarray (Innium HumanMethylation450 array) in the discovery set (Supplementary Table 1) and looked for differential methylation in ccRCC tumours and normal tissue at CpG sites across the genome (Fig. 1). The volcano plot (Fig. 2a) showed that the log2 fold change of 102 CpG sites was more than 2.5 for 46 pairs of tumour and adjacent normal tissue, based on the genome-
wide analysis of CpG methylation (t-test, all Po10 9; false discovery rate o10 8; Supplementary Data 1). The 102 CpGs identied in univariate analysis were entered into a multivariate logistic regression model (the least absolute shrinkage and selection operator (LASSO)) and 18 had non-zero coefcients (Fig. 2b,c).
Constructing and validating the CpG-based classier. We then carried out pyrosequencing to quantify the methylation value of these 18 CpG sites by using formalin-xed, parafn-embedded (FFPE) specimens from the Sun Yat-sen University (SYSU) set of 168 ccRCC patients. Supplementary Table 3 shows univariate Cox regression analysis of overall survival based on each of the 18 CpGs in the SYSU set (P 0.490.001). We used
a multivariate LASSO Cox regression model to build a CpG-based prognostic classier, which included 5 of the 18 CpGs: cg00396667, cg18815943, cg03890877, cg07611000 and cg14391855 (Fig. 2d and Supplementary Fig. 1). These ve CpG sites were in the regions of genes PITX1, FOXE3, TWF2, EHBP1L1 and RIN1, respectively. Using the LASSO Cox regression models, we also calculated a risk score for each patient based on individualized values of methylation for the ve genes: risk score (0.0066 PITX1) (0.0034 FOXE3) (0.027 TWF2)
(0.018 EHBP1L1) (0.03 RIN1). When we assessed the
distribution of risk scores for the ve-CpG-based classier and survival status, patients with lower risk scores generally had better survival than those with higher risk scores (Fig. 3a, left panel). Patients in the SYSU set were divided into high-risk or low-risk groups, using the median risk score ( 0.1) as the cutoff.
Compared with patients in low-risk group, patients in the high-risk group had shorter overall survival (hazard ratio 4.27, 95%
condence interval 2.188.37, log-rank test P 3.9 10 6;
Fig. 3a, right panel).
To estimate the reproducibility and validity of the ve-CpG-based classier, we performed international validation using data sets comprising ccRCC patients from a site in the United States (University of Texas Southwestern Medical Center at Dallas, UTSW set, 243 cases) and multiple clinical centres in China (MCHC set, 284 cases). Furthermore, we used the external data set, TCGA data set (298 cases), to validate our ve-CpG-based classier (Fig. 1 and Table 1). Methylation value of the ve CpG
Discovery set (46 pairs ccRCC and adjacent normal tissue)
SYSU set (168 ccRCC patients from 1 Chinese centre)
MCHC set (284 ccRCC patients from 3 Chinese centres)
UTSW set (243 ccRCC patients from 1 American centre)
TCGA set (298 ccRCC patients from data setof multiple American centres)
TCGA set (507 ccRCC patients with mRNA expression information)
Genome-wide analysis: 102 differential methylation
CpG sites
Prognostic value of 18 individual
CpG sites
Validate the five-CpG-based prognostic classifier
Correlation between CpG methylation and gene expression
Prognostic value of five genes corresponding to 5 CpGs
LASSO logistic regression:18 CpG sites with non-zero coefficients
LASSO Cox regression: construct a five-CpG-based prognostic classifier
Multivariate Cox regression analysis and stratified analysis by clinical variables
Interactions betweenthe five genes and well-validated ccRCC genes analyzed
Figure 1 | Flow chart indicating study design. We identied candidate CpGs sites from 46 paired ccRCC and adjacent normal tissues by CpG methylation microarray in the discovery set. We then used a multivariate LASSO Cox regression model to build a CpG-based prognostic classier in SYSU set. Furthermore, the ve-CpG-based classier was validated in MCHC, UTSW and TCGA data sets. Relationship between CpG methylation, gene expression and patient prognosis were also analysed in the TCGA set.
2 NATURE COMMUNICATIONS | 6:8699 | DOI: 10.1038/ncomms9699 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications
& 2015 Macmillan Publishers Limited. All rights reserved.
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms9699 ARTICLE
a
b
1.0
85
17
Univariate analysis
LASSO logistic regression
LASSO Cox regression
40
10
30
0.8
Log 10 P
20
30
0.6
Log 10 P
0.4
10
20
0.2
0
4 3 2 1 0 1 2 3
n=102
n=102
0.0
CpG methylation difference (log2 FC, Tumour-normal)
c
1.0
Tumour Normal
3 1 1 2
2 0 3
1.0
Absolute values of coefficient
cg03890877 (TWF2)
cg07611000 (EHBP1L1)
0.8
cg14391855 (RIN1)
cg03364683
cg17557540
0.1
cg12109838
0.6
cg00026222
cg07166409
cg23983315
cg02231066
0.01
0.4
cg24888989
cg16518772
cg00355909
cg13474998
0.2
cg00396667 (PITX1)
cg05241461
0.001
cg18815943 (FOXE3)
cg23497383
0.0
d
0 1 3 5 12 13 13 15 15 15 15 15 15
1 2 3 4 5
Coefficients
0 5 13 13 15 15
11.5
0.05
0.00
cg18815943(FOXE3)
cg23497383
cg13474998
cg00396667(PITX1)
cg05241461
cg00026222
cg03890877(TWF2)
cg24888989
cg17557540
cg02231066
cg03364683
cg23983315
cg16518772
cg12109838
cg00355909
cg07166409
cg07611000(EHBP1L1)
cg14391855(RIN1)
Partial likelihood deviance
11.0
0.05
10.5
0.10
10.0
0.15
9.5
0.0 0.2 0.4 0.6 0.8
Log ()
L1 Norm
Figure 2 | Construction of the ve-CpG-based classier. (a) One hundred and two CpG sites selected by univariate analysis. Volcano plot showing a comparison of CpG methylation for ccRCC tumour tissues versus adjacent normal tissues (n 46, HumanMethylation450 platform). This plot depicts the
biological signicance (log2 fold change (FC)) on the X axis and the statistical signicance ( log10 P) on the Y axis. Log2 FC42 5 for 102 CpGs; the
methylation level of 17 CpGs is higher in tumour in comparison with normal tissue (magenta) and lower in 85 CpGs (turquoise). (b) Eighteen CpG sites selected by LASSO logistic regression analysis. Histogram of the univariate t-test P-values is shown, in the upper left panel, as log10 P for all 102 CpGs.
A matrix representing the pairwise correlation (r2, Spearmans correlation) between the CpGs is displayed in the upper right panel. The lower left panel shows a histogram of the absolute values of the coefcients for all 102 CpGs, of which 18 had non-zero coefcients by LASSO logistic regression analysis. The correlation structure between the 18 CpGs with non-zero coefcients is shown in the lower right panel, demonstrating reduced multicollinearity. (c) Heatmap showing methylation of the 18 CpGs in ccRCC tumour tissue (46 samples) and adjacent normal tissue (46 samples). (d) Five CpG sites selected by LASSO Cox regression analysis. Left panel: the two dotted vertical lines are drawn at the optimal values by minimum criteria (right) and 1 s.e.
criteria (left). Details are provided in Methods. Right panel: LASSO coefcient proles of the 18 CpGs. A vertical line is drawn at the optimal value by 1 s.e.
criteria and results in ve non-zero coefcients. Five CpGscg00396667 (PITX1), cg18815943 (FOXE3), cg03890877 (TWF2), cg07611000 (EHBP1L1) and cg14391855 (RIN1)with coefcients 0.0066, 0.0034, 0.027, 0.018 and 0.03, respectively, were selected in the LASSO Cox regression model.
sites is shown for each set in Supplementary Fig. 2. The risk score for each patient in the sets was calculated with the same formula used in the SYSU set, patients with lower risk scores generally had better survival than those with higher risk scores (Fig. 3bd, left panel). Patients in these three sets were classied into high-risk
and low-risk groups with the same cutoff used in the SYSU set ( 0.1). Patients in the high-risk groups had shorter overall
survival than those in the low-risk groups in all three sets (hazard ratio 2.964.82, log-rank test P 1.4 10 62.2 10 9;
Fig. 3bd (right panel) and Supplementary Table 4). After
NATURE COMMUNICATIONS | 6:8699 | DOI: 10.1038/ncomms9699 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications 3
& 2015 Macmillan Publishers Limited. All rights reserved.
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms9699
a
SYSU set
MCHC set
UTSW set
TCGA set
0.5
0.0
Dead Alive
Risk score of the
five-CpG-based classifier
100
80
60
40
20
0
No. at risk
Low risk 84 76 61 45 24 12 4 1 High risk 84 66 49 34 20 8 1 0
Low risk
P =3.9106
HR=4.27 (2.188.37)
High risk
Low risk
HR=3.88 (2.416.27)
High risk
Low risk
HR=4.82 (2.608.95)
High risk
Low risk
HR=2.96 (1.864.72)
High risk
Overall survival (%)
0.5
1.0
1.5
PITX1 FOXE3
TWF2 EHBP1L1
RIN1
0
20
40
60
80
100
120
Months after surgery
140
b
0.5
0.0
Risk score of the
five-CpG-based classifier
Dead Alive
Dead Alive
Dead Alive
1.0
Overall survival (%)
100
80
60
40
0
127 114 99 73 30 9 1 0
0.5
1.5
20 P =2.2109
P =3.5108
20 P =1.4106
PITX1 FOXE3
TWF2 EHBP1L1
RIN1
0
20
40
60
80
100
120
No. at risk
Low risk High risk
Months after surgery
140
157 150 141 108 53 24 11 2
c
0.5
0.0
Risk score of the
five-CpG-based classifier
0.5
1.0
Overall survival (%)
100
80
60
40
20
0
1.5
PITX1 FOXE3
TWF2 EHBP1L1
RIN1
0
20
40
60
80
100
Months after surgery
120
No. at risk
Low risk 142 106 56 36 13 2 0 High risk 101 69 38 26 12 5 0
Months after surgery
120
d
five-CpG-based classifier
0.5
0.0
Risk score of the
Overall survival (%)
100
80
60
40
0
0.5
1.0
1.5
2.0
PITX1 FOXE3
0
20
40
60
80
100
TWF2 EHBP1L1
RIN1
No. at risk
Low risk 138 82 53 31 14 1 0 High risk 160 90 56 27 7 1 0
Figure 3 | Risk score calculated by the ve-CpG-based classier and KaplanMeier survival in the four different sets. (a) SYSU set, (b) MCHC set, (c) UTSW set and (d) TCGA set. Upper left panel: risk-score distribution of the ve-CpG-based classier and patient survival status. Lower left panel: heatmap showing methylation of the ve CpGs in the patients. Right panel: KaplanMeier survival analysis for the patients. The patients were divided into low-risk and high-risk groups using the median cutoff value of the classier risk score ( 0.1). P-values were calculated using the log-rank test.
HR, hazard ratio.
4 NATURE COMMUNICATIONS | 6:8699 | DOI: 10.1038/ncomms9699 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications
& 2015 Macmillan Publishers Limited. All rights reserved.
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms9699 ARTICLE
Table 1 | Baseline characteristics of patients by the ve-CpG-based classier assessment set.
Characteristic SYSU set (n 168) MCHC set (n 284) UTSW set (n 243) TCGA set (n 298)
No.of patients
Low risk (%)
High risk (%)
Age (years)o60 107 51 (48%) 56 (52%) 178 104 (58%) 74 (42%) 128 82 (64%) 46 (36%) 129 71 (55%) 58 (45%)
Z60 61 33 (54%) 28 (46%) 106 53 (50%) 53 (50%) 115 60 (52%) 55 (48%) 169 67 (40%) 102 (60%)
Sex
Male 113 55 (49%) 58 (51%) 190 109 (57%) 81 (43%) 151 87 (58%) 64 (42%) 193 71 (37%) 122 (63%) Female 55 29 (53%) 26 (47%) 94 48 (51%) 46 (49%) 92 55 (60%) 37 (40%) 105 67 (64%) 38 (36%)
Race
Asian 168 84 (50%) 84 (50%) 284 157 (55%) 127 (45%) 4 1 (25%) 3 (75%) 1 0 (0%) 1 (100%) White 0 0 183 104 (57%) 79 (43%) 264 120 (45%) 144 (55%) Black 0 0 36 23 (64%) 13 (36%) 30 18 (60%) 12 (40%) Notavailable
0 0 20 14 (70%) 6 (30%) 3 0 (0%) 3 (100%)
High risk (%)
No.of patients
Low risk (%)
High risk (%)
No.of patients
Low risk (%)
High risk (%)
No.of patients
Low risk (%)
Grade
G1 8 6 (75%) 2 (25%) 21 15 (71%) 6 (29%) 10 8 (80%) 2 (20%) 6 6 (100%) 0 (0%) G2 87 42 (48%) 45 (52%) 134 80 (60%) 54 (40%) 128 84 (66%) 44 (34%) 123 75 (61%) 48 (39%) G3 51 25 (49%) 26 (51%) 88 45 (51%) 43 (49%) 77 38 (49%) 39 (51%) 120 50 (42%) 70 (58%) G4 22 11 (50%) 11 (50%) 41 17 (41%) 24 (59%) 28 12 (43%) 16 (57%) 49 7 (14%) 42 (86%)
Tumour sizeo5 cm 60 33 (55%) 27 (45%) 140 76 (54%) 64 (46%) 136 93 (68%) 43 (32%) 119 76 (64%) 43 (36%)
Z5 cm 108 51 (47%) 57 (53%) 144 81 (56%) 63 (44%) 107 49 (46%) 58 (54%) 178 62 (35%) 116 (65%) Notavailable
0 0 0 1 0 (0%) 1 (100%)
Tumour necrosis
Absent 104 56 (54%) 48 (46%) 189 102 (54%) 87 (46%) 164 103 (63%) 61 (37%) 138 71 (51%) 67 (49%) Present 64 28 (44%) 36 (56%) 95 55 (58%) 40 (42%) 70 32 (46%) 38 (54%) 160 67 (42%) 93 (58%) Notavailable
0 0 9 7 (78%) 2 (22%) 0
pT
T1 97 49 (51%) 48 (49%) 180 101 (56%) 79 (44%) 156 107 (69%) 49 (31%) 145 95 (66%) 50 (34%) T2 30 15 (50%) 15 (50%) 54 27 (50%) 27 (50%) 30 10 (33%) 20 (67%) 38 18 (47%) 20 (53%) T3 37 17 (46%) 20 (54%) 46 27 (59%) 19 (41%) 52 24 (46%) 28 (54%) 107 23 (21%) 84 (79%) T4 4 3 (75%) 1 (25%) 4 2 (50%) 2 (50%) 5 1 (20%) 4 (80%) 8 2 (25%) 6 (75%)
pN
N0 152 78 (51%) 74 (49%) 267 151 (57%) 116 (43%) 226 134 (59%) 92 (41%) 129 62 (48%) 67 (52%) N1 16 6 (37%) 10 (63%) 17 6 (35%) 11 (65%) 17 8 (47%) 9 (53%) 8 1 (12%) 7 (88%) NX 0 0 0 161 75 (47%) 86 (53%)
M
M0 163 83 (51%) 80 (49%) 274 150 (55%) 124 (45%) 221 136 (62%) 85 (38%) 244 125 (51%) 119 (49%) M1 5 1 (20%) 4 (80%) 10 7 (70%) 3 (30%) 22 6 (27%) 16 (73%) 54 13 (24%) 41 (76%)
Stage (clinical)
Stage I 91 45 (49%) 46 (51%) 171 96 (56%) 75 (44%) 155 107 (69%) 48 (31%) 141 95 (67%) 46 (33%) Stage II 27 15 (56%) 12 (44%) 48 24 (50%) 24 (50%) 25 9 (36%) 16 (64%) 28 15 (54%) 13 (46%) Stage III 36 17 (47%) 19 (53%) 43 28 (65%) 15 (35%) 39 20 (51%) 19 (49%) 73 15 (20%) 58 (80%) Stage IV 14 7 (50%) 7 (50%) 22 9 (41%) 13 (59%) 24 6 (25%) 18 (75%) 56 13 (23%) 43 (77%)
MCHC, multiple clinical centres in China; SYSU, Sun Yat-sen University; TCGA, The Cancer Genome Atlas; UTSW, University of Texas Southwestern Medical Center at Dallas.
adjusting for standard clinical prognostic factors (age, TNM stage, Fuhrman grade and necrosis status), the ve-CpG-based classier remained an independent prognostic factor in the SYSU set and the three other patient sets (Table 2, all Po0.05).
Stratication analysis of the ve-CpG-based classier. Survival analysis was further performed with regard to the ve-CpG-based classier in subsets of patients with different clinical variables. When stratied by clinical variables (sex, age, race, Fuhrman grade, tumour size and necrosis status), the ve-CpG-based classier was still a clinically and statistically signicant prognostic model (Fig. 4a, Supplementary Fig. 3 and Supplementary Table 5). As shown in Fig. 4b, the ccRCC patients in the same clinical stage could be successfully separated into the subgroups of better prognosis and poorer prognosis by the ve-CpG-based classier (log-rank test, all Po0.05).
The SSIGN score (ranging from 0 to 15) is one of the clinically integrated systems that was introduced to improve prognostic accuracy in ccRCC (Supplementary Table 6). The KaplanMeier
curves regarding overall survival for respective SSIGN-score categories are shown in Fig. 5a. The ve-CpG-based classier successfully categorized patients into high-risk and low-risk groups with signicant differences of clinical outcome in each of the SSIGN-score categories (log-rank test, all Po0.05; Fig. 5b-f).
Thus, the ve-CpG-based classier can add prognostic value to both the clinical stage and the SSIGN score.
Impact of intratumour heterogeneity. To determine whether intratumour heterogeneity (ITH) affected risk score and risk stratication based on the ve-CpG-based classier, we assayed methylation value of the ve CpG sites in three different regions within 23 ccRCC tumours. As shown in Supplementary Fig. 5, inter-individual differences in the methylation of the ve CpG sites, assessed by averaging all measurements from the same tumour, were signicantly higher than measurement differences within individual tumours. ITH had an obviously smaller effect on classier-based risk scores (coefcient of variation (CV),10.5%) than on the ve individual CpGs (CV, 15.222.3%).
NATURE COMMUNICATIONS | 6:8699 | DOI: 10.1038/ncomms9699 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications 5
& 2015 Macmillan Publishers Limited. All rights reserved.
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms9699
Table 2 | Multivariate Cox regression analysis of the ve-CpG-based classier with overall survival in the four sets.
Parameters SYSU set MCHC set UTSW set TCGA set
HR (95% CI) P-value HR (95% CI) P-value HR (95% CI) P-value HR (95% CI) P-value Age (younger than 60 yearsversus 60 years or older)
1.18 (0.662.11) 0.58 2.13 (1.363.33) 0.001 1.76 (0.983.14) 0.06 1.28 (0.812.02) 0.29
pT (T1/2 versus T3/4) 2.82 (1.425.56) 0.003 1.99 (1.203.31) 0.008 2.39 (1.274.50) 0.007 1.63 (1.012.63) 0.05 pN (N0 versus N1) 3.16 (1.377.28) 0.007 4.59 (2.398.83) o0.001 2.01 (0.954.26) 0.07 * *
M (M0 versus M1) 7.41 (1.9727.89) 0.003 1.61 (0.604.27) 0.34 3.10 (1.466.57) 0.003 2.77 (1.784.31) o0.001 Grade (G1/2 versus G3/4) 1.88 (0.973.66) 0.06 1.60 (1.012.56) 0.05 1.34 (0.692.60) 0.39 1.84 (1.073.19) 0.03
Tumour necrosis (absent versus present)
1.28 (0.961.71) 0.09 1.46 (1.171.83) 0.001 1.10 (0.811.50) 0.53 2.46 (1.484.09) 0.001
Five-CpG-based classier (low versus high risk)
4.10 (2.058.19) o0.001 3.73 (2.286.09) o0.001 3.36 (1.786.34) o0.001 1.80 (1.112.93) 0.02
CI, condence interval; HR, hazard ratio; MCHC, multiple clinical centres in China; SYSU, Sun Yat-sen University; TCGA, The Cancer Genome Atlas; UTSW, University of Texas Southwestern Medical Center at Dallas.
Tumour size was not included in the multivariate analysis due to colinearity with pathologic T stage.*pN was not included in the multivariate analysis in TCGA set, because pN (N0 versus N1) was not a prognostic factor (P-value 0.21) in univariate Cox regression analysis and the nodal involvement
status of 161 patients (54% of the total of 298 patients) was not available in this set.
ITH affected risk stratication in 2 (8.7%) of the 23 tumours, suggesting the 5-CpG-based classier is a precise tool (Supplementary Table 7).
CpG methylation and gene expression and patient prognosis. Using the TCGA data set, we analysed whether methylation of the ve CpGs was correlated with gene expression, as per Spearmans correlation. We observed that the correlation between methylation value and gene expression by Spearmans correlation test was signicantly inverse for TWF2 (P 5.8 10 11), EHBP1L1
(P 1.9 10 6) and RIN1 (P 1.2 10 30), signicantly
positive for PITX1 (P 4.1 10 8) and marginally positive for
FOXE3 (P 0.09).
Nine hundred and ninety-three patients in the entire cohort were separated into CpG-dened high-risk and low-risk groups using X-tile plots, to generate the optimum cutoff score for methylation of the ve CpGs. KaplanMeier survival analysis, depicted in Fig. 6ae (left panel), showed the overall survival of patients in the CpG-dened low-risk group was signicantly better than in the high-risk group. In addition, expression of the genes corresponding to the 5 CpGs effectively predicted the clinical outcome of the 507 patients for whom there were messenger RNA expression data in the TCGA data set (Fig. 6ae, right panel).
Integrating our results with genes linked to RCC. To further evaluate the role of genes corresponding to the ve CpGs in relation to well-validated ccRCC susceptibility genes, we used the cBioPortal for Cancer Genomics network to evaluate gene connectivity. As shown in Fig. 6f, PITX1 interacts with EGR1, which is then connected to an immune response network. RIN1 interacts with RAB5A, which is connected to genes that are involved in cancer cell epithelial-to-mesenchymal transition. TWF2 mainly participates in cancer cell proliferation signalling pathways through interaction with chromogranin B (CHGB). FOXE3 and EHBP1L1 showed exceptionally low connectivity in the database.
DiscussionIntegrating multiple biomarkers into a single model would substantially improve prognostic value compared with a single biomarker13. As genome-wide technologies have become more sophisticated, so too have molecular prognostic models, which can now integrate mRNA, microRNA, CpG and single-nucleotide polymorphism (SNP) data7,1419. However, early studies with integrated models had several notable limitations. (1) There was a
lack of information (such as risk score formulas or biomarker coefcients) on how to integrate multiple biomarkers into one model, which restricted wide use of these models in the clinic.(2) Some models incorporated too many biomarkers, making it nearly impossible to apply them in clinical practice.(3) Inappropriate statistical methods were used to mine microarray data. More specically, in microarray analysis, the number of covariates is usually close to or larger than the number of observations. The Cox proportional hazards regression analysis, which is the most popular approach for modelling covariate information for survival times, is unsuitable for high-dimensional microarray data when the sample-size-to-variables ratio is too low (such as o10:1)20,21. The LASSO model used in our study is one of the statistical methods that can eliminate this limitation2224. (4) Models were developed based on analysis of fresh-frozen specimens, limiting immediate clinical application in a broad community setting. (5) Models were not validated in multiple independent cohorts. Thus, none of the integrated prognostic models developed using genome-wide, microarray-based analysis are being used in clinical practice. In this study, we developed a practical CpG-methylation-based assay that can be used with FFPE material to identify prognostic CpG information and demonstrated how this information can be integrated into a prognostic model that is feasible to use in the clinic.
ITH can impair the precise molecular analysis of tumours, because biomarker expression can vary across different tumour regions25. Some prognostic biomarkers could not be validated in previous reports and one possible cause was large intra-sample variability in gene expression26. However, two recent studies showed ITH, although present at the level of individual gene expression, did not preclude precise microarray-based predictions of clinical outcome in ccRCC or breast cancer26,27. Compared with a single prognostic biomarker, our integrated prognostic models based on microarray proling not only have higher prognostic accuracy but also are less inuenced by ITH.
Several studies have analysed gene expression proles in RCC and examined their potential clinical relevance2831. These signatures contained large numbers of genes that were detected by microarray or reverse transcriptasePCR and, consequently, these signatures had limited use in clinical practice. In this study, we identied methylation level of ve highly prognostic CpG sites by pyrosequencing from the FFPE material. Given the fewer number of markers, our classier is both more feasible and cheaper compared with the prognostic signatures proposed in previous studies. The ve-CpG-based classier can accurately distinguish between patients with ccRCC, with substantially
6 NATURE COMMUNICATIONS | 6:8699 | DOI: 10.1038/ncomms9699 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications
& 2015 Macmillan Publishers Limited. All rights reserved.
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms9699 ARTICLE
a
HR (95% CI)
Subgroup
Sex
Male Female Age
<60 years 60 years
Race
Asian White Grade
G1/2G3/4 Clinical stage
I/IIIII/IVTumour necrosis
Absent Present Tumour size
<5 cm 5 cm All patients
P-value
< 0.001 < 0.001
< 0.001 < 0.001
< 0.001 < 0.001
< 0.001 < 0.001
< 0.001 < 0.001
< 0.001 < 0.001
< 0.001 < 0.001 < 0.001
Endpoint
3.57 (2.584.95)4.63 (2.897.40)
5.55 (3.598.57)2.71 (1.933.81)
4.12 (2.796.07)3.80 (2.555.66)
3.70 (2.365.81)3.35 (2.394.71)
3.41 (2.344.98)3.38 (2.294.97)
4.54 (2.947.00)3.25 (2.314.59)
2.85 (1.854.40)4.27 (3.016.05)3.94 (3.015.15)
1 2 3 4 5 6 7 8 9
b
Stage I
Stage II
Low risk High risk
Low risk High risk
Overall survival (%)
100
80
60
40
0
Overall survival (%)
100
80
60
40
20
0
20 P <0.001
P < 0.001
HR=3.31 (2.165.09)
Months after surgery
120
HR=3.54 (1.597.89)
0
20
40
60
80
100
120
0
20
40
60
80
100
120
No. at risk
Low risk 343 276 209 154 69 27 12 3 High risk 215 182 142 102 45 17 2 0
No. at risk
Low risk 63 53 44 29 18 6 3 0 High risk 65 55 46 29 13 5 0 0
Months after surgery
140
Months after surgery
140
Stage III Stage IV
Low risk High risk
Low risk High risk
Overall survival (%)
100
80
60
40
20
0
Overall survival (%)
100
80
60
40
20
0
P < 0.001
P = 0.006
HR=4.93 (2.699.01)
HR=1.98 (1.193.29)
0
20
40
60
80
100
0
20
40
60
80
100
No. at risk
Low risk 64 46 29 12 4 0 High risk 66 40 22 9 1 0
No. at risk
Low risk 35 21 12 8 5 2 0 High risk 81 36 14 7 2 0 0
Months after surgery
120
80
111
Figure 4 | Stratication analysis of the ve-CpG-based classier. (a) Hazard ratio (HR) of overall mortality for all 993 patients with ccRCC according to the ve-CpG-based classier in different subgroups stratied by clinical parameters. (b) KaplanMeier survival analysis of the ve-CpG-based classier in subsets of different clinical stage patients with ccRCC (log-rank test).
different clinical outcomes, even after adjustment for standard clinical prognostic factors, such as age, TNM stage, Fuhrman grade and necrosis status. We further performed international validation using data sets comprising patients from a site in the United States and MCHC, as well as patients in TCGA data set, who were also from multiple centres in the United States. The prognostic accuracy of the ve-CpG-based classier was similar in the three validation sets. The classier was reproducible regardless of clinical centre, country or race and it can provide
prognostic value that complements the clinical stage and the SSIGN score.
Five genes corresponded to the ve CpGs identied in our study: FOXE3, PITX1, RIN1, TWF2 and EHBP1L1. DNA methylation of FOXE3 has been reported and validated as a diagnostic biomarker for paediatric acute lymphoblastic leukemia32. Hypermethylation of PITX1 and RIN1 has been described in human salivary gland adenoid cystic carcinoma and breast cancer, respectively33,34. TWF2 has been implicated in
NATURE COMMUNICATIONS | 6:8699 | DOI: 10.1038/ncomms9699 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications 7
& 2015 Macmillan Publishers Limited. All rights reserved.
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms9699
a b c
SSIGN score
100
80
60
40
20
0
100
80
60
40
20
0
SSIGN score: 01
Low risk
01
Overall survival (%)
Overall survival (%)
100
80
60
40
20
0
SSIGN Score: 23
23
Low risk
Overall survival (%) Overall survival (%)
45
High risk
68
9
High risk
P<0.001
P<0.001
P =0.02
P < 0.001
HR=1.96 (1.792.15)
HR=2.33 (1.154.71)
HR=3.37 (1.925.91)
0
20
40
60
80
100
120
140
0
20
40
60
80
100
120
140
0
20
40
60
80
100
120
140
No. at risk
Low risk 182 154 108 78 34 13 4 2 High risk 106 93 72 50 22 8 1 0
No. at risk
Low risk 155 122 103 76 37 16 7 0 High risk 101 83 67 49 23 8 0 0
Months after surgery Months after surgery Months after surgery
d e f
100
80
60
40
20
0
SSIGN score: 45
Overall survival (%)
100
80
60
40
20
0
SSIGN score: 68
100
80
60
40
20
0
SSIGN score: 9
Low risk
Low risk
Overall survival (%)
Low risk
High risk
High risk
High risk
P < 0.001
P =0.005
HR=5.04 (2.589.82)
HR=3.15 (1.586.26)
HR=2.02 (1.213.37)
0
20
40
60
80
100
120
0
20
40
60
80
100
120
0
20
40
60
80
100
120
Months after surgery
140
Months after surgery Months after surgery
No. at risk
Low risk 93 74 56 36 19 8 3 1 High risk 89 64 47 28 9 4 1 0
No. at risk
Low risk 48 38 30 23 11 1 1 High risk 88 57 40 26 13 3 0
No. at risk
Low risk 36 21 11 5 3 1 0 High risk 85 39 14 6 2 0 0
Figure 5 | Analysis of the ve-CpG-based classier in subsets of different SSIGN-score categories. (a)The KaplanMeier curves regarding overall survival for respective SSIGN-score categories. (bf) KaplanMeier survival analysis of the ve-CpG-based classier in subsets of different SSIGN-score categories (log-rank test). HR, hazard ratio.
neurite outgrowth35. However, the function of EHBP1L1 remains unknown. Our pathway analysis results showed that these genes may play diverse roles in regulating ccRCC progression, including tumour immune response, cancer cell proliferation and epithelial-to-mesenchymal transition. Notably, these genes are all distributed at the periphery of the signalling network, in contrast to central network markers such as PTEN and TP53. This nding is similar to recent studies showing that epigenetic marker drift occurs preferentially in genes that occupy peripheral network positions of exceptionally low connectivity7,36,37.
In conclusion, the present study suggests the newly developed ve-CpG-based classier is a practical and powerful prognostic tool for ccRCC, which can provide prognostic value that complements the current staging system of ccRCC and will facilitate patient counselling, tailoring of follow-up protocols and selection for appropriate adjuvant trial designs.
Methods
Patients. In this study, we used 695 FFPE tissue samples from 695 patients who underwent resection of a ccRCC. The SYSU set included 168 patients from the First Afliated Hospital and Cancer Center of SYSU (Guangdong, Southeast China) treated between 2001 and 2009. The MCHC set included 284 patients treated between 2001 and 2009 at three hospitals across different regions of China: First Afliated Hospital of Xian Jiaotong University (Shaanxi, Northwest China), Afliated Yantai Yuhuangding Hospital of Qingdao University Medical College (Shandong, Northeast China) and Afliated Hospital of Kunming University of Science and Technology (Yunnan, Southwest China) between 2001 and 2009. Another 243 patients from the University of Texas Southwestern Medical Center at Dallas (TX, USA) treated between 2004 and 2011 comprised the UTSW set. The TNM 2009 staging system was used to classify ccRCC patients. The grading system used in the study was based on the Fuhrman four grade. Clinical baseline data were obtained through medical record review. Patients with sporadic, unilateral ccRCC and with clinicopathological characteristics and follow-up information available
were included. In addition, to generate CpG methylation expression proles we obtained, as a discovery set, a panel of 46 fresh-frozen tumour samples with paired adjacent normal tissue from patients with ccRCC treated between 2011 and 2013 at the First Afliated Hospital of SYSU. Consent was obtained for all subjects and the protocols approved by the respective Institutional Review Board of each institution.
Innium methylation assay microarrays. In the discovery set, we used the HumanMethylation450 BeadChip (Illumina, San Diego, CA, USA) for genome-wide assessment of methylation at CpG sites38. Genomic DNA was extracted from 46 paired ccRCC tumour and adjacent normal tissues with the QIAamp DNA mini kit (Qiagen, Valencia, CA, USA) following the manufacturers recommendations. All DNA samples were assessed for integrity, quantity and purity by electrophoresis in a 1.3% agarose gel, PicoGreen quantication and NanoDrop measurements, respectively. The samples that passed quality control were processed with Innium HumanMethylation450 BeadChip Kits (Illumina) according to the manufacturers recommendations, through automated processes in the Genomic and Microarray Core, University of Texas Southwestern Medical Center. Arrays were imaged with BeadArray Reader using standard Illumina scanner settings. The signal data were extracted and processed using RnBeads39 version 0.99.12 in the R software 3.0.3. We considered a methylation b-value to be unreliable if its corresponding detection
P-value was not below the threshold T 0.05. Both sites and samples were ltered
using a greedy approach. BMIQ normalization methods and the background subtraction methylumi.noob methods implemented in the RnBeads package was applied40,41. We removed probes containing an SNP in the assayed CpG dinucleotide, as well as those for which two or more SNPs were located in the probe sequence7. We removed probes not mapping uniquely to the human reference genome (hg19) allowing for one mismatch under the criteria ofPrice et al.42 Non-CpG targeting probes (Ch probes) and the probes included in the sex chromosomes were also removed43. Using the annotations provided by Illumina for the HumanMethylation450 platform, only probes located in the CpG islands and shores were kept for analysis in this study. The R Linear Models for Microarray Data (Limma) package44 was used to compare b-values and to identify differentially methylated probes between cancer and adjacent normal tissues. P-values were calculated from the moderated t-statistics and multiple testing correction of the P-values was performed using Benjamini and Hochbergs method (false discovery rate), to identify differentially methylated probes. Microarray data were uploaded to the National Center for Biotechnology Informations Gene
8 NATURE COMMUNICATIONS | 6:8699 | DOI: 10.1038/ncomms9699 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications
& 2015 Macmillan Publishers Limited. All rights reserved.
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms9699 ARTICLE
a
PITX1
FOXE3
TWF2
EHBP1L1
RIN1
CpG methylation mRNA expression
Larger low population
Larger high population
Larger low population
Larger high population
Gene expression defined
No. at risk
Low risk 323 214 150 82 27 4
High risk 184 121 69 33 8 0 0
CpG methylation defined
Low risk
High risk
100
80 60 40 20
0
No. at risk
Low risk 830 643 481 339 154 56 16 3
High risk 163 110 72 41 19 6 1 0
Overall survival (%)
Overall survival (%)
100
80 60 40 20
0
65 [afii9851]
45
[afii9851]
0
0
0
20
40
60
80
100
120
0
20
40
60
80
100
Months after surgery
140
Months after surgery
120
0
b
CpG methylation mRNA expression
Larger low population
Larger high population
CpG methylation defined Gene expression defined
Larger low population
Larger high population
100
80 60 40 20
0
No. at risk
Low risk 295 232 182 121 60 24 11 1
High risk 698 521 371 259 113 38 6 2
Overall survival (%)
Overall survival (%)
[afii9851]
0
100
80 60 40 20
0
No. at risk
Low risk 295 198 130 74 26 2 0
High risk 212 137 89 41 9 2 0
45
[afii9851]
25
0
0
20
40
60
80
100
120
0
20
40
60
80
100
Months after surgery
140
Months after surgery
120
c
CpG methylation mRNA expression
CpG methylation defined
Low risk
High risk
Larger low population
Larger high population
Larger low population
Larger high population
Gene expression defined
Low risk
High risk
100
80 60 40 20
0
No. at risk
Low risk 364 254 179 111 47 12 3 0
High risk 629 499 374 269 126 50 14 3
Overall survival (%)
100
80 60 40 20
0
No. at risk
Low risk 275 183 124 66 21 3 0
High risk 232 152 95 49 14 1 0
Overall survival (%)
[afii9851]
[afii9851]
75
25
0
0
P<0.001
HR=2.12 (1.542.93)
0
20
40
60
80
100
120
0
20
40
60
80
100
Months after surgery
140
Months after surgery
120
d
CpG methylation mRNA expression
Larger low population
Larger high population
CpG methylation defined Gene expression defined
Larger low population
Larger high population
100
80 60 40 20
0
No. at risk
Low risk 257 174 123 75 30 10 4 1
High risk 736 579 430 305 143 52 13 2
Overall survival (%)
100
80 60 40 20
0
No. at risk
Low risk 451 305 203 108 34 4 0
High risk 56 30 16 7 1 0 0
Overall survival (%)
[afii9851]
[afii9851]
65
25
0
0
0
20
40
60
80
100
120
0
20
40
60
80
100
Months after surgery
140
Months after survival
120
e
CpG methylation mRNA expression
P<0.001
HR=2.31 (1.832.91)
Larger low population
Larger high population
CpG methylation defined
Low risk
High risk
Larger low population
Larger high population
Gene expression defined
Overall survival (%)
100
80 60 40 20
0
No. at risk
Low risk 332 229 156 83 28 3 0
High risk 175 106 63 32 7 1 0
Overall survival (%)
[afii9851]
[afii9851]
0
100
80 60 40 20
0
No. at risk
Low risk 312 221 159 112 54 18 2 0
High risk 681 532 394 268 119 44 15 3
55
35
0
0
20
40
60
80
100
120
0
20
40
60
80
100
Months after surgery
140
Months after surgery
120
f
Figure 6 | X-tile plots of the genes that correspond to the ve CpGs and network analyses. X-tile plots of the CpG methylation (993 patients in the entire cohort) and mRNA expression of the ve genes (507 patients in the TCGA data set): (a) PITX1, (b) FOXE3, (c) TWF2, (d) EHBP1L1 and (e) RIN1. X-tile plots provide a single and intuitive method to assess the association between marker expression and survival, and automatically select the optimum cut point according to the highest w2-value dened by KaplanMeier survival analysis and log-rank test. Colouration of the plot represents the strength of the association at each division, ranging from low (dark, black) to high (bright, red or green). Red represents inverse association between marker expression and survival, whereas green represents direct association between marker expression and survival. Each pixel represents an individual cutpoint where the number of patients in the group increases as progressed down for the high-expression group (larger high population) or to the right for the low-expression group (larger low population). The dark dots (indicated by arrow) in the X-tile plots are the sites according to the highest w2-value and are used as the cutoff points separating patients into high-risk and low-risk groups. (f) Network analyses of the genes that correspond to the ve CpGs by cBioPortal. PITX1, TWF2 and RIN1 were predicted to have an impact on a diverse network of genes and pathways, as per the cBioPortal for Cancer Genomics network analysis tool. Black line means interactions between the two entities; blue arrow represents that the rst entity controls a reaction that changes the state of the second entity. HR, hazard ratio.
NATURE COMMUNICATIONS | 6:8699 | DOI: 10.1038/ncomms9699 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications 9
& 2015 Macmillan Publishers Limited. All rights reserved.
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms9699
Expression Omnibus (Series GSE61441, http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=ufaxumuubrqxpgr&acc=GSE61441
Web End =http://www.ncbi.nlm.nih.gov/geo/query/ http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=ufaxumuubrqxpgr&acc=GSE61441
Web End =acc.cgi?token=ufaxumuubrqxpgr& http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=ufaxumuubrqxpgr&acc=GSE61441
Web End =acc=GSE61441 ).
Pyrosequencing. The methylation level of CpG sites was evaluated with pyrosequencing in the SYSU, MCHC and UTSW sets. DNA from parafn-embedded tissue blocks was extracted from four sequential unstained sections, each 15 mm thick. For each sample of tumour tissue, subsequent sections were stained with haematoxylin and eosin for histological conrmation of the presence (470%) of tumour cells. Genomic DNA was extracted with the QIAamp DNA FFPE Tissue Kit (Qiagen) following the manufacturers recommendations. Bisulte conversion was performed on 1 mg of DNA with the EpiTect Bisulte Kit (Qiagen). Twenty nanograms of converted DNA was used as a template in each subsequent PCR. Specic sets of primers for PCR amplication and sequencing were designed using the PyroMark Assay Design 2.0 software (Qiagen). All primer sequences are listed in Supplementary Table 2. PCRs were performed with the PyroMark PCR Kit (Qiagen) under the following conditions: 95 C for 15 min, 45 cycles of 94 C for 30 s, 56 C for 30 s and 72 C for 30 s, and an elongation step of 72 C for 10 min. The success of amplication was assessed by 2% agarose gel electrophoresis. PCR products were pyrosequenced with the PyroMark Q24 pyrosequencer (Qiagen) according to the manufacturers protocol (Pyro-Gold reagents). Output data were analysed using PyroMark Q24 2.0.6 Software (Qiagen), which calculates the CpG methylation value as the percentage (mC/[mC C]) for each CpG site, allowing
quantitative comparisons. Controls to assess proper bisulte conversion of the DNA were included in each run and sequencing controls were used to ensure the delity of the measurements.
TCGA data and network analysis. For the TCGA set, clinical data, CpG methylation value (level 3 data, Innium HumanMethylation450) and mRNA expression (level 3 data, RNA-seq Version 2 Illumina) were downloaded from the TCGA data portal (http://tcga-data.nci.nih.gov/tcga/
Web End =http://tcga-data.nci.nih.gov/tcga/) on 1 October 2014. The clinical data included 512 retrospectively identied patients who underwent radical or partial nephrectomy between 1998 and 2010 for sporadic ccRCC45. Of the 512 patients, CpG methylation data were available for 298 patients and mRNA expression data were available for 507 patients. Of the 298 patients, VHL, PBRM1 and BAP1 gene mutation data were available for 242 (Supplementary Fig. 6). The cBioPortal for Cancer Genomics (http://cbioportal.org
Web End =http://cbioportal.org) network was used to search for pathways and interactions that might be linked to genes that correspond to the identied CpG sites in ccRCC46.
Intratumour heterogeneity. ITH was investigated by extracting DNA samples from morphologically distinct regions within the tumours of 23 patients with ccRCC treated between 2011 and 2013 at the First Afliated Hospital of SYSU (FFPE specimens; three different regions coded as R1, R2 and R3; Supplementary Fig.4). Methylation of the ve CpG sites was detected with pyrosequencing. The s.d. and CV were used to describe the inter-sample variability of CpG methylation between the 23 ccRCCs and the intra-sample variability between different regions.
Statistical analysis. The goal of this study was to identify prognostic classier that predicts overall survival. This is dened as the time between surgery and death or the last follow-up date. Volcano plot analysis was used to select CpG sites based on absolute fold change in combination with t-test P-values. LASSO logistic regression analysis was used to identify the candidate CpG sites with non-zero coefcients in the discovery set. LASSO Cox regression analysis was used to select the prognostic markers of the candidate CpG sites and to construct a multi-CpG-based classier for predicting the overall survival of patients with ccRCC in the SYSU set. We used the KaplanMeier method to analyse the correlation between variables and overall survival, and we used the log-rank test to compare survival curves. Multivariate survival analysis was performed using the Cox regression model. X-tile plots were used to generate the optimum cutoff point for continuous variables according to the highest w2-value dened by KaplanMeier survival analysis and log-rank test47.
X-tile plots were created with X-tile software version 3.6.1 (Yale University School of Medicine, New Haven, CT, USA) and all the other statistical tests were performed with R software version 3.0.3 (R Foundation for Statistical Computing, Vienna, Austria). Statistical signicance was set at 0.05.
LASSO regression analysis. The high dimensionality of microarray-based experiments in contrast to the small number of samples easily leads to overtting. Regularized linear models such as logistic regression with LASSO penalty are popular solutions to tting sparse models in which only a small subset of features plays a role48. LASSO can be used with high-dimensional data for optimal selection of genes with a strong diagnostic or prognostic value and low correlation among each other to prevent overtting4952. LASSO is a form of regularized or penalized regression where L1 regularization is introduced into the standard multiple linear regression procedure using a compound cost function to optimize the regression coefcients. LASSO regression shrinks the coefcient estimates towards zero, with the degree of shrinkage depending on an additional parameter, l. In this way, coefcient estimates can be forced to be exactly zero, thereby effectively eliminating a number of variables. We adopted the LASSO regression model to achieve
shrinkage and variable selection simultaneously. Ten-time cross-validations were used to determine the optimal values of l (refs 5153). We choose l via 1 s.e.
criteria, that is, the optimal l is the largest value for which the partial likelihood deviance is within 1 s.e. of the smallest value of partial likelihood deviance24. We used R software version 3.0.3 (R Foundation for Statistical Computing) and the glmnet package to perform LASSO regression analysis.
References
1. Ljungberg, B. et al. EAU guidelines on renal cell carcinoma: 2014 update. Eur. Urol. 67, 913924 (2015).
2. Zigeuner, R. et al. External validation of the Mayo Clinic stage, size, grade, and necrosis (SSIGN) score for clear-cell renal cell carcinoma in a single European centre applying routine pathology. Eur. Urol. 57, 102109 (2010).
3. Ficarra, V. et al. The Stage, Size, Grade and Necrosis score is more accurate than the University of California Los Angeles Integrated Staging System for predicting cancer-specic survival in patients with clear cell renal cell carcinoma. BJU Int. 103, 165170 (2009).
4. Brock, M. V. et al. DNA methylation markers and early recurrence in stage I lung cancer. N. Engl. J. Med. 358, 11181128 (2008).
5. Castelo-Branco, P. et al. Methylation of the TERT promoter and risk stratication of childhood brain tumours: an integrative genomic and molecular study. Lancet Oncol. 14, 534542 (2013).
6. Esteller, M. Relevance of DNA methylation in the management of cancer. Lancet Oncol. 4, 351358 (2003).
7. Sandoval, J. et al. A prognostic DNA methylation signature for stage I non-small-cell lung cancer. J. Clin. Oncol. 31, 41404147 (2013).
8. Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 20592074 (2013).
9. Ricketts, C. J. et al. Genome-wide CpG island methylation analysis implicates novel genes in the pathogenesis of renal cell carcinoma. Epigenetics 7, 278290 (2012).
10. Lasseigne, B. N. et al. DNA methylation proling reveals novel diagnostic biomarkers in renal cell carcinoma. BMC Med. 12, 235 (2014).
11. Arai, E. et al. Multilayer-omics analysis of renal cell carcinoma, including the whole exome, methylome and transcriptome. Int. J. Cancer 135, 13301342 (2014).
12. Ibragimova, I. et al. Genome-wide promoter methylome of small renal masses. PLoS ONE 8, e77309 (2013).
13. Kratz, J. R. et al. A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international validation studies. Lancet 379, 823832 (2012).
14. van de Vijver, M. J. et al. A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347, 19992009 (2002).
15. Liu, N. et al. Prognostic value of a microRNA signature in nasopharyngeal carcinoma: a microRNA expression analysis. Lancet Oncol. 13, 633641 (2012).
16. Yoon, K. A. et al. Genetic variations associated with postoperative recurrence in stage I non-small cell lung cancer. Clin. Cancer Res. 20, 32723279 (2014).
17. Buyse, M. et al. Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer. J. Natl Cancer Inst. 98, 11831192 (2006).
18. De Sousa, E. M. F. et al. Poor-prognosis colon cancer is dened by a molecularly distinct subtype and develops from serrated precursor lesions. Nat. Med. 19, 614618 (2013).
19. Arai, E. et al. Single-CpG-resolution methylome analysis identies clinicopathologically aggressive CpG island methylator phenotype clear cell renal cell carcinomas. Carcinogenesis 33, 14871493 (2012).
20. Simon, R. & Altman, D. G. Statistical aspects of prognostic factor studies in oncology. Br. J. Cancer 69, 979985 (1994).
21. Joseph, F., Hair, J., Anderson, R. E., Tatham, R. L. & Black, W. C. Multivariate Data Analysis, 4th edn (Prentice-Hall, Inc., 1995).
22. Tibshirani, R. The lasso method for variable selection in the Cox model. Stat. Med. 16, 385395 (1997).
23. Zhang, H. H. & Lu, W. Adaptive Lasso for Coxs proportional hazards model. Biometrika 94, 691703 (2007).
24. Zhang, J. X. et al. Prognostic and predictive value of a microRNA signature in stage II colon cancer: a microRNA expression analysis. Lancet Oncol. 14, 12951306 (2013).
25. Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883892 (2012).
26. Gulati, S. et al. Systematic evaluation of the prognostic impact and intratumour heterogeneity of clear cell renal cell carcinoma biomarkers. Eur. Urol. 66, 936948 (2014).
27. Barry, W. T. et al. Intratumor heterogeneity and precision of microarray-based predictors of breast cancer biology and clinical outcome. J. Clin. Oncol. 28, 21982206 (2010).
28. Zhao, H. et al. Gene expression proling predicts survival in conventional renal cell carcinoma. PLoS Med. 3, e13 (2006).
10 NATURE COMMUNICATIONS | 6:8699 | DOI: 10.1038/ncomms9699 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications
& 2015 Macmillan Publishers Limited. All rights reserved.
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms9699 ARTICLE
29. Kosari, F. et al. Clear cell renal cell carcinoma: gene expression analyses identify a potential signature for tumor aggressiveness. Clin. Cancer Res. 11, 51285139 (2005).
30. Brooks, S. A. et al. ClearCode34: A prognostic risk predictor for localized clear cell renal cell carcinoma. Eur. Urol. 66, 7784 (2014).
31. Escudier, B. J. et al. Validation of a 16-gene signature for prediction of recurrence after nephrectomy in stage I-III clear cell renal cell carcinoma (ccRCC). ASCO Meeting Abstracts 32, 4502 (2014).
32. Chatterton, Z. et al. Validation of DNA methylation biomarkers for diagnosis of acute lymphoblastic leukemia. Clin. Chem. 60, 9951003 (2014).
33. Bell, A., Bell, D., Weber, R. S. & El-Naggar, A. K. CpG island methylation proling in human salivary gland adenoid cystic carcinoma. Cancer 117, 28982909 (2011).
34. Milstein, M. et al. RIN1 is a breast tumor suppressor gene. Cancer Res. 67, 1151011516 (2007).
35. Yamada, S. et al. Identication of twinlin-2 as a factor involved in neurite outgrowth by RNAi-based screen. Biochem. Biophys. Res. Commun. 363, 926930 (2007).
36. West, J., Widschwendter, M. & Teschendorff, A. E. Distinctive topology of age-associated epigenetic drift in the human interactome. Proc. Natl Acad. Sci. USA 110, 1413814143 (2013).
37. Cheng, C. P. et al. Network-based analysis identies epigenetic biomarkers of esophageal squamous cell carcinoma progression. Bioinformatics 30, 30543061 (2014).
38. Dick, K. J. et al. DNA methylation and body-mass index: a genome-wide analysis. Lancet 383, 19901998 (2014).
39. Assenov, Y. et al. Comprehensive analysis of DNA methylation data with RnBeads. Nat. Methods 11, 11381140 (2014).
40. Teschendorff, A. E. et al. A beta-mixture quantile normalization method for correcting probe design bias in Illumina Innium 450 k DNA methylation data. Bioinformatics 29, 189196 (2013).
41. Triche, Jr T. J., Weisenberger, D. J., Van Den Berg, D., Laird, P. W. & Siegmund, K. D. Low-level processing of Illumina Innium DNA Methylation BeadArrays. Nucleic Acids Res. 41, e90 (2013).
42. Price, M. E. et al. Additional annotation enhances potential for biologically-relevant analysis of the Illumina Innium HumanMethylation450 BeadChip array. Epigenetics Chromatin 6, 4 (2013).
43. Chen, Y. A. et al. Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Innium HumanMethylation450 microarray. Epigenetics 8, 203209 (2013).
44. Gentleman, R., Carey, V., Huber, W., Irizarry, R. & Dudoit, S. Bioinformatics and Computational Biology Solutions Using R and Bioconductor (Statistics for Biology and Health) (Springer-Verlag, Inc., 2005).
45. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 4349 (2013).
46. Gao, J. et al. Integrative analysis of complex cancer genomics and clinical proles using the cBioPortal. Sci. Signal. 6, pl1 (2013).
47. Camp, R. L., Dolled-Filhart, M. & Rimm, D. L. X-tile: a new bio-informatics tool for biomarker assessment and outcome-based cut-point optimization. Clin. Cancer Res. 10, 72527259 (2004).
48. Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267288 (1996).
49. Goeman, J. J. L1 penalized estimation in the Cox proportional hazards model. Biom. J. 52, 7084 (2010).
50. Gui, J. & Li, H. Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics 21, 30013008 (2005).
51. Sveen, A. et al. ColoGuidePro: a prognostic 7-gene expression signature for stage III colorectal cancer patients. Clin. Cancer Res. 18, 60016010 (2012).
52. Olk-Batz, C. et al. Aberrant DNA methylation characterizes juvenile myelomonocytic leukemia with poor outcome. Blood 117, 48714880 (2011).
53. Kohavi, R. In Proceedings of the 14th International Joint Conference on Articial Intelligence, Vol 2 (Morgan Kaufmann Publishers Inc., 1995).
Acknowledgements
The study was supported by grants from the National Natural Science Foundation of China (81572905, 81372730, 81225018 and 81372357) and the Guangdong Provincial Science and Technology Foundation (2014B020212015). We thank the TCGA for their efforts and providing data.
Author contributions
J.H.L. designed the study. A.H., K.J.W., H.W.Z., Z.L.Z., L.Y.Z., Z.H.C., Y.H.Y., Z.R.W., F.J.Z., L.S., Q.Z. Liu, Z.L.G., D.L.H., W.C., J.T.H. and V.M. obtained and assembled data. J.H.W., A.H., K.J.W., H.W.Z., P.K., Z.L.Z., L.Y.Z., Z.H.C., Y.Y.Z., J.C.Z., B.W., M.Y.C., D.X., B.L., C.X.L., P.X.L., Q.Z. Li and J.H.L. analysed and interpreted the data. J.H.W., A.H. and J.H.L. wrote the report, which was edited by all authors, who have approved the nal version. J.H.L., W.C. and D.X. are the guarantors.
Additional information
Accession codes: Methylation array data have been deposited in Gene Expression Omnibus database under accession code GSE61441.
Supplementary Information accompanies this paper at http://www.nature.com/naturecommunications
Web End =http://www.nature.com/ http://www.nature.com/naturecommunications
Web End =naturecommunications
Competing nancial interests: The authors declare no competing nancial interests.
Reprints and permission information is available online at http://npg.nature.com/reprintsandpermissions/
Web End =http://npg.nature.com/ http://npg.nature.com/reprintsandpermissions/
Web End =reprintsandpermissions/
How to cite this article: Wei, J.-H. et al. A CpG-methylation-based assay to predict survival in clear cell renal cell carcinoma. Nat. Commun. 6:8699doi: 10.1038/ncomms9699 (2015).
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the articles Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
Web End =http://creativecommons.org/licenses/by/4.0/
NATURE COMMUNICATIONS | 6:8699 | DOI: 10.1038/ncomms9699 | http://www.nature.com/naturecommunications
Web End =www.nature.com/naturecommunications 11
& 2015 Macmillan Publishers Limited. All rights reserved.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright Nature Publishing Group Oct 2015
Abstract
Clear cell renal cell carcinomas (ccRCCs) display divergent clinical behaviours. Molecular markers might improve risk stratification of ccRCC. Here we use, based on genome-wide CpG methylation profiling, a LASSO model to develop a five-CpG-based assay for ccRCC prognosis that can be used with formalin-fixed paraffin-embedded specimens. The five-CpG-based classifier was validated in three independent sets from China, United States and the Cancer Genome Atlas data set. The classifier predicts the overall survival of ccRCC patients (hazard ratio=2.96-4.82; P=3.9 × 10-6 -2.2 × 10-9 ), independent of standard clinical prognostic factors. The five-CpG-based classifier successfully categorizes patients into high-risk and low-risk groups, with significant differences of clinical outcome in respective clinical stages and individual 'stage, size, grade and necrosis' scores. Moreover, methylation at the five CpGs correlates with expression of five genes: PITX1, FOXE3, TWF2, EHBP1L1 and RIN1. Our five-CpG-based classifier is a practical and reliable prognostic tool for ccRCC that can add prognostic value to the staging system.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer