-
Abbreviations
- AUC
- area under the receiver operating characteristic curve
- CRC
- colorectal cancer
- GEO
- Gene Expression Omnibus
- IBD
- inflammatory bowel diseases
- k‐TSP
- k‐Top Scoring Pairs
- REO
- relative expression ordering
- RNA‐seq
- RNA sequencing
- TCGA
- The Cancer Genome Atlas
- TSP50
- testes‐specific protease 50
Colorectal cancer (CRC) is the third most commonly diagnosed malignancy and the fourth leading cause of cancer‐related deaths in the world. Patients with CRC are easily curable when diagnosed at an early stage, thus the early diagnosis of CRC is crucial for the fight against this cancer. However, most CRC patients are diagnosed with middle or advanced stage disease. Currently, established noninvasive tests, such as the guaiac‐based fecal occult blood test, have a low sensitivity and positive predictive value. Several serum protein biomarkers, including carcinoembryonic antigen, CA19.9, and CA125, can be used for monitoring the prognosis of CRC patients but none of them are recommended for the early diagnosis of CRC. The expression of TSP50 has also been proposed as a diagnostic signature for CRC, but its sensitivity, specificity, and positive predictive value were 68.4%, 92.5%, and 95.6%, respectively. This signature is based on a risk score summarized from quantitative expression measurements of TSP50 protein, which lacks robustness for clinical applications due to large measurement batch effects.
In clinical practice, biopsy sampling with less invasive techniques, such as colonoscopy, are often used for the initial clinical evaluation of CRC. However, an indeterminate diagnosis often creates a dilemma. It has been reported that the miss rate of CRC after colonoscopy, which is the predominant screening and diagnostic test for CRC, is approximately 15% for patients with IBD. Moreover, the biopsy location can be inaccurate, which might lead to inaccurately sampled adjacent nontumor tissues and degrading the diagnosis performance. However, previously reported diagnostic signatures, such like the transcriptional signatures reported by Zheng et al and our previous study, all took tumor‐adjacent normal tissues as the normal samples to obtain the signature. Thus, these signatures cannot classify inaccurately sampled CRC adjacent normal tissues to CRC. Given that the adjacent nontumor colorectal tissues of CRC patients might have some molecular characteristics of CRC, it is possible to develop a signature to discriminate CRC (including CRC adjacent tissues) from tissues of nontumor (normal or IBD) individuals, which is suitable for minimum biopsy specimens and inaccurately sampled specimens.
Another major limitation of the previously reported transcriptional diagnostic signatures is that their applications are based on risk scores summarized from the quantitative expression measurements of the signature genes, which are sensitive to batch effects and hardly applicable to individualized diagnoses. Notably, several reported quantitative transcriptional disease signatures, including AlloMap, have been approved by the US FDA. However, due to the existence of batch effects, the tissue samples must be sent to specific laboratories for measurement with strict quality control.
In contrast, the REOs of genes within individual samples, which are the qualitative transcriptional characteristics, are robust against experimental batch effects and can be directly applied to samples at the individualized level. The robustness property of the REO enables researchers to integrate multiple datasets produced by the same or similar platforms for developing disease signatures or classifiers, which makes it more likely to find robust signatures. In addition, the qualitative transcriptional characteristics are highly robust against varied proportions of the tumor epithelial cell in specimens sampled from different tumor locations of the same patients, partial RNA degradation during specimen preparation and storage, and amplification bias for minimum specimens, which are the common factors that lead to the failure of quantitative transcriptional signatures in clinical practice. Therefore, it is worth exploiting the within‐sample REOs to identify a robust qualitative signature for the early diagnosis of CRC.
In this study, based on the robust within‐sample REOs, we identified a qualitative transcriptional signature consisting of 7 gene pairs for the early diagnosis of CRC. The signature can accurately discriminate CRC tissues, including CRC adjacent normal tissues, from normal or IBD tissues of non‐CRC individuals in both biopsy and surgical resection samples.
The gene expression profiles of 33 CRC biopsy specimens were measured by Affymetrix platform in our laboratory and this study (NCT02770911) was approved by the Institutional Review Board at Fujian Medical University Union Hospital (No. 2015‐23; Fuzhou, China). Written informed consents for all the 33 participants were obtained. The tumor biopsy specimens were obtained by endoscopy. RNA was extracted using the RNeasy Mini Kit (Qiagen), and was measured by Affymetrix GeneChip PrimeView Array. For the raw data (.CEL file) from the array platform, the Robust Multi‐Array Average algorithm was applied for background adjustment without quantile normalization.
We also measured 13 CRC surgical resection specimens, from 5 CRC patients, with the RNA‐seq platform. This study was approved by the institutional review boards of all participating institutions, and written consent forms were obtained from all participants. For each patient, 3 specimens were sampled from 3 different locations. Of these, 2 specimens were excluded from the subsequent analysis due to poor RNA quality (RNA integrity number less than 6.0). The proportion of tumor epithelial cells for each of the 13 tumor specimens, ranging from 40% to 100% (see Table ), was measured by pathological section analysis. After surgical resection, the obtained cancer specimens were fresh‐frozen for the subsequent RNA extraction. According to the manufacturer's protocol, total RNA was isolated from fresh‐frozen CRC tissues using TRIzol reagent (Invitrogen) and the quality of RNA was assessed by Agilent 2200 TapeStation (Agilent Technologies). Then mRNA was captured from 1‐2 μg total RNA using NEBNext PolyA mRNA Magnetic Isolation Module and stranded RNA‐seq libraries were constructed using a NEBNext Ultra Directional RNA Library Prep Kit. Paired‐end sequencing (2 × 150) was undertaken using an Illumina HiSeqXten and generated raw RNA‐seq files (fastq) were preprocessed using Trimmomatic, and the reference genome (GRCh37) was used to align reads using hisat2. Finally, the fragments per kilobase of transcript per million fragments mapped values of genes were calculated using StringTie.
Proportions of tumor epithelial cells in colorectal cancer (CRC) tissues| Patient | Proportion 1 | Proportion 2 | Proportion 3 |
| CRC 1 | 70% | – | 40% |
| CRC 2 | 40% | 100% | 100% |
| CRC 3 | 50% | 90% | 90% |
| CRC 4 | 60% | 100% | 100% |
| CRC 5 | 100% | 100% | – |
–, No sample in the corresponding category due to poor RNA quality.
Multiple gene expression profiles were downloaded from the GEO repository (http://www.ncbi.nlm.nih.gov/geo/), ArrayExpress (
| Platform | Sampling method | Sample size | ||||
| Normal | IBD | Adjacent normal | Cancer | |||
| Datasets used for identification of the qualitative signature | ||||||
| GSE4183 | AffymetrixGPL570 | Biopsy | 8 | 15 | – | 15 |
| GSE9348 | AffymetrixGPL570 | Biopsy | 12 | – | – | 70 |
| GSE35452 | AffymetrixGPL570 | Biopsy | – | – | – | 46 |
| GSE22619 | AffymetrixGPL570 | Biopsy | 10 | 10 | – | – |
| GSE14580 | AffymetrixGPL570 | Biopsy | – | 24 | – | – |
| GSE13367 | AffymetrixGPL570 | Biopsy | – | 16 | – | – |
| GSE18105 | AffymetrixGPL570 | Surgery | – | – | 17 | 77 |
| GSE23878 | AffymetrixGPL570 | Surgery | – | – | 24 | 35 |
| GSE33113 | AffymetrixGPL570 | Surgery | – | – | 6 | 90 |
| GSE32323 | AffymetrixGPL570 | Surgery | – | – | 17 | 17 |
| GSE41328 | AffymetrixGPL570 | Surgery | – | – | 10 | 10 |
| GSE17536 | AffymetrixGPL570 | Surgery | – | – | – | 177 |
| GSE35144 | AffymetrixGPL570 | Surgery | – | – | – | 27 |
| E‐GEOD‐72819 | Illumina GPL11154 | Biopsy | – | 73 | – | – |
| E‐GEOD‐50760 | Illumina GPL11154 | Surgery | – | – | 18 | 36 |
| Datasets used for evaluating the performance of the qualitative signature | ||||||
| GSE47908 | AffymetrixGPL570 | Biopsy | 15 | 39 | – | – |
| GSE36807 | AffymetrixGPL570 | Biopsy | 7 | 28 | – | – |
| GSE16879 | AffymetrixGPL570 | Biopsy | – | 43 | – | – |
| GSE12251 | AffymetrixGPL570 | Biopsy | – | 23 | – | – |
| GSE9452 | AffymetrixGPL570 | Biopsy | – | 8 | – | – |
| GSE45404 | AffymetrixGPL570 | Biopsy | – | – | – | 42 |
| GSE21510 | AffymetrixGPL570 | Surgery | – | – | 25 | 104 |
| GSE22598 | AffymetrixGPL570 | Surgery | – | – | 17 | 17 |
| GSE27854 | AffymetrixGPL570 | Surgery | – | – | – | 115 |
| GSE35896 | AffymetrixGPL570 | Surgery | – | – | – | 62 |
| Our_Data1 | Affymetrix PrimeView Array | Biopsy | – | – | – | 33 |
| Our_Data2 | Illumina HiSeqXten | Surgery | – | – | – | 13 |
| TCGA | Illumina HiSeq_RNASeqV2 | Surgery | – | – | 39 | 556 |
–, No sample in the corresponding category; IBD, inflammatory bowel disease; TCGA, The Cancer Genome Atlas.
For the data measured by the Affymetrix platform, we downloaded the raw mRNA expression data (.CEL files) and used the Robust Multi‐array Average algorithm for background adjustment without quantile normalization. For the sequence‐based data, the fragments per kilobase of transcript per million fragments mapped or reads per kilobase of transcript per million reads mapped value was downloaded.
For the array‐based data, if multiple probes were mapped to a gene, the expression value of the gene was defined as the arithmetic mean of the values of the multiple probes. If a probe was mapped to zero or multiple genes, then the data of this probe were deleted. For the sequence‐based data from ArrayExpress, the gene symbols were mapped to Entrez gene ID with the biological database network. For the sequence‐based data from TCGA, the Ensembl gene IDs corresponding to the unique Entrez gene IDs of protein coding genes were used.
First, within a sample, the REO of two genes, i and j, is denoted as Gi > Gj (or Gi < Gj) if the expression level of gene i is higher (or lower) than that of gene j. If the same REO pattern is maintained in a majority of samples, eg 85%, it is called a stable REO and the pair is a stable gene pair. A gene pair with stable REOs in both groups of samples, but the REO patterns are opposite, is called a reversal gene pair. Here, we selected the reversal gene pairs that are stable in noncancer samples and cancer samples, but the REO patterns are reversed in the latter group. They form the candidate REO signature of the cancer.
Then the selected candidate REO signature above were sorted in a descending order according to their reversal degree, where the reversal degree for each reversal gene pair was calculated as follow: [Image Omitted. See PDF]where \vertmean[Rij(cancer)]\vert and mean \vert[Rij(non_cancer)]\vert represent the absolute of the means of rank differences of the reversal gene pair (i, j) in cancer samples and noncancer samples, respectively. The rank difference for each reversal gene pair was calculated as follows: [Image Omitted. See PDF]where Ri and Rj represent the rank of gene i and gene j in a sample, respectively, and Rij is the rank difference between the 2 genes. Obviously, the higher the reversal degree for a gene pair, the higher the cross‐platform performance is for this gene pair.
Finally, we used the top‐k gene pairs, where k is ranging from 1 to the total number of the reversal gene pairs, to classify the samples based on the majority vote rule. The value of k was chosen when its value reached the highest geometric mean of the sensitivity and specificity in the training data. The top‐k gene pairs were selected as the early diagnosis signature of CRC.
Cancer samples, including cancer and cancer adjacent normal samples, were classified as positive samples; noncancer samples, including normal and IBD samples, were classified as negative samples. The performance of the signature was evaluated using sensitivity and specificity, which are calculated as follows: [Image Omitted. See PDF][Image Omitted. See PDF]where TP, TN, FP, and FN represent the number of true‐positive, true‐negative, false‐positive, and false‐negative samples, respectively.
The AUCs were calculated with the nonparametric Hanley‐McNeil algorithm and 95% confidence intervals for AUCs were determined using an approximate normal distribution.
The analysis procedure of this study is described in Figure . First, using 30 normal samples and 65 IBD samples collected from 5 datasets measured by the Affymetrix platform (see Table ), 11 558 060 gene pairs with identical REO patterns in at least 85% of both the normal and IBD samples were identified as stable gene pairs of noncancer samples. Similarly, using 564 CRC samples and 74 CRC adjacent normal samples collected from 10 datasets measured by the Affymetrix platform, 106 958 978 gene pairs with identical REO patterns in at least 85% of both the CRC and CRC adjacent normal samples were identified as stable gene pairs of cancer samples. We found 218 reversal gene pairs between the non‐CRC and CRC tissues including the adjacent normal tissues from the above 2 lists of gene pairs identified from the data measured by the Affymetrix platform. Among these 218 gene pairs, we further selected 7 gene pairs that had the identical REO pattern in at least 85% of 73 noncancer samples and reversal REO patterns in at least 85% of 54 cancer samples in the combined data from the E‐GEOD‐50760 and E‐GEOD‐72819 datasets measured by the RNA‐seq platform.
Analysis procedure for identifying the colorectal cancer (CRC) diagnosis signature. IBD, inflammatory bowel disease; RNA‐seq, RNA sequencing
Then, the 7 gene pairs were sorted in a descending order according to their reversal degrees (see Materials and Methods 2.3) between CRC (including CRC and CRC adjacent normal) and non‐CRC samples (normal and IBD) in the combined data from the training set, as shown in Table . We then used the top‐ranked k pairs to classify samples according to the majority vote rule. The results showed that, for all possible k ranging from 1 to 7, the largest geometric mean of the sensitivity and specificity was 97.08% when k = 7 (Figure ). Thus, these 7 gene pairs, as described in Table , were selected as the signature for discriminating CRC samples from noncancer samples. We additionally showed the expression pattern of the 7 gene pairs (consisting of 13 genes) in the training datasets measured by Affymetrix platform. As shown in Figures S1 and S2, the results showed that, for each gene pair, the REO is stable in both types of samples, but the REO patterns are opposite.
Performance of k gene pairs of relative expression ordering‐based signatures in the training set of biopsy and surgically resected colorectal cancer and noncancer samples
| Signature | Gene i | Gene j |
| Pair 1 | AREG | TRIM40 |
| Pair 2 | SCARNA2 | CHRNE |
| Pair 3 | SCARNA2 | CASKIN1 |
| Pair 4 | ARHGAP10 | KIAA0125 |
| Pair 5 | KCNH2 | ZNF671 |
| Pair 6 | CLCN5 | C19orf44 |
| Pair 7 | SSBP1 | DHRS7 |
Gene i has a higher expression level than gene j in CRC tissue samples compared with non‐CRC tissue samples.
We then validated the performance of the 7 gene pairs in multiple public datasets for biopsy and surgically resected samples. For a total of 977 cancer samples and 163 noncancer samples from these public databases, the geometric mean of the sensitivity and specificity was 96.80% and the AUC was 0.9589 (95% confidence interval, 0.9521‐0.9657; Figure ).
Area under the receiver operating characteristic curve (AUC) of the validation data from public databases of biopsy and surgically resected colorectal cancer and noncancer samples
Notably, all the colorectal normal and IBD tissue samples from non‐CRC individuals and 42 CRC tissue samples from GSE45404 were obtained by endoscopic biopsy. For these biopsy samples measured by the Affymetrix platform, 90.9% of the 22 normal samples from healthy individuals and 95.0% of the 141 IBD samples of non‐CRC patients were correctly identified as non‐CRC, while 97.6% of the 42 cancer samples were correctly identified as CRC. The detailed results of each dataset are shown in Table . These results indicated that our signature is suitable for the early diagnosis of CRC based on biopsy specimens.
Performance of the gene signature in the validation datasets for colorectal biopsy samples| Normal | IBD | Adjacent_normal | Cancer | Specificity | Sensitivity | |
| GSE36807 | 7 | 28 | – | – | 85.71% | – |
| GSE12251 | – | 23 | – | – | 100.00% | – |
| GSE9452 | – | 8 | – | – | 100.00% | – |
| GSE47908 | 15 | 39 | – | – | 92.59% | – |
| GSE16879 | – | 43 | – | – | 100.00% | – |
| GSE45404 | – | – | – | 42 | – | 97.62% |
–, No information in the corresponding category; IBD, inflammatory bowel disease.
For surgically resected samples measured by the Affymetrix platform, all of the 298 CRC samples and 42 CRC adjacent normal samples were correctly identified as CRC. For the data measured by the RNA‐seq platform, 99.3% of the 556 CRC samples and 92.3% of the 39 CRC adjacent normal samples were correctly identified as CRC. The detailed results of each dataset are shown in Table . These results suggested that the 7 gene pairs could identify most of the adjacent nontumor colorectal tissues from CRC patients as CRC, which is suitable for inaccurately sampled specimens.
Performance of the gene signature in the validation datasets for surgically resected colorectal samples| Normal | IBD | Adjacent _normal | Cancer | Specificity | Sensitivity | |
| GSE21510 | – | – | 25 | 104 | – | 100.00% |
| GSE22598 | – | – | 17 | 17 | – | 100.00% |
| GSE27854 | – | – | – | 115 | – | 100.00% |
| GSE35896 | – | – | – | 62 | – | 100.00% |
| TCGA | – | – | 39 | 556 | – | 98.82% |
–, No information in the corresponding category; IBD, inflammatory bowel disease; TCGA, The Cancer Genome Atlas.
Among the 556 CRC samples from TCGA, 536 samples included staging information. 99.0% of 96 patients with stage I, 99.0% of 209 patients with stage II, 100.0% of 156 patients with stage III, and 98.7% of 75 patients with stage IV were correctly identified as CRC. The clinical stage status did not affect the validation results using the GEO dataset either. All of the 104 samples from dataset GSE21510, including 13 patients with stage I, 37 patients with stage II, 34 patients with stage III, and 20 patients with stage IV, were correctly identified as CRC. Moreover, all of the 62 CRC samples from the dataset GSE35896 had their gene mutation status information (KRAS, BRAF, APC, TP53, PIK3CA, and PTEN), but all of them were correctly identified as CRC regardless of the mutation status of any gene. Among the dataset GSE35896, 61 of the 62 CRC samples had microsatellite instability information. All of the 56 patients with stable microsatellite status and 5 patients with unstable microsatellite status were correctly identified as CRC, regardless of the microsatellite status. The results further indicated that our signature is robust against clinicopathological variations.
To further validate the signature, using the RNA‐seq platform, we additionally measured gene expression profiles of 13 CRC surgical resection specimens from 5 CRC patients, each with 3 specimens sampled from 3 tumor locations with different proportions of tumor epithelial cells (see Table ). Two specimens were excluded from the gene expression measurements because of poor RNA quality. All the 13 CRC specimens were correctly identified as CRC by our signature, even when the proportion of tumor epithelial cells was as low as 40%, which further verified that the REO‐based signature is robust against varied proportions of tumor epithelial cells for the same patient with different tumor locations. Moreover, for the 33 CRC biopsy specimens measured by the Affymetrix platform in our laboratory, all of them were correctly identified as CRC based on our signature.
In summary, the above results together revealed that the signature can accurately discriminate CRC from non‐CRC individuals using both surgical resection and biopsy samples measured by different platforms. In particular, the signature is robust against varied proportions of tumor epithelial cells in specimens sampled from different tumor locations of the same patients.
In this study, we identified a robust qualitative signature of 7 gene pairs, consisting of 13 genes, for the early diagnosis of CRC, which can discriminate CRC and CRC adjacent tissues from IBD and normal tissue of non‐CRC individuals. It means that, even when the specimens are sampled inaccurately, this signature can still aid the early diagnosis of CRC. The REO‐based qualitative transcriptional signature is robust against experimental batch effects and invariant to monotone data transformation, and it can be directly applied to samples at the individualized level. For a total of 1023 cancer sample and 163 noncancer samples from the validation datasets, the sensitivity, specificity, and positive predictive value of our signature was 99.22%, 94.48%, and 99.12%, which indicate the robustness of our signature. As shown in Table , among the 5 validation datasets with noncancer samples, our signature has 100% specificity in 3 datasets, GSE12251, GSE9452, and GSE16879. For the other 2 datasets, GSE47908 and GSE36807, our signature has 92.59% and 85.71% specificity, respectively. For the dataset GSE47908 with 54 noncancer samples (including 15 normal and 39 IBD samples [including 19 pancolitis and 20 left‐sided colitis samples]), all the 15 normal samples and 20 left‐sided colitis samples were correctly identified as noncancer, whereas 4 of the 19 pancolitis samples were identified as cancer. Because patients with pancolitis have a higher cancer incidence risk than those with left‐sided colitis, we speculate that these 4 pancolitis samples might have some characteristics of cancer. Similarly, for the GSE36807 database with 35 noncancer samples (including 7 normal and 28 IBD samples), 2 normal and 3 IBD (1 Crohn's disease and 2 ulcerative colitis) samples were identified as cancer. Those healthy individuals with normal samples, including that were identified as cancer, were referred for colorectal cancer screening; we speculated that some of them might also have some characteristics of cancer.
Under many practical situations, with tissue biopsy sampling, it is difficult to obtain sufficient a quantity of RNA molecules for gene expression profiling or other molecular measurements. Fortunately, our recent study showed that the REO‐based signatures can be robustly applied to minimum specimens even with approximately 15 cancer cells. Therefore, it is highly possible that the 7 gene pairs could be applicable for biopsy samples with minimum sampling amounts. Moreover, the REO‐based signature was robust against varied proportions of tumor epithelial cells from the same patient with different tumor locations, which is a common factor that could lead to the failure of the quantitative transcriptional signature in clinical practice. This study also showed that 13 specimens from 5 patients with different sampling locations, with different proportions of tumor epithelial cells (see Table ), were correctly identified as CRC.
As for the other REO‐based approaches, such as TSP and k‐TSP, we additionally evaluated these approaches using the same training and validation datasets, as shown in Table . Using the tspair R package (version 3.3.3) and ktspair R package (version 3.3.3), we trained the TSP and k‐TSP classifier in the combined sample data from the training datasets measured by the Affymetrix and RNA‐seq platforms, respectively. In the validation data, the k‐TSP classifier performed better than the TSP classifier but poorer than our signature, as shown in Tables S1 and S2. For example, for the 33 CRC samples measured by our laboratory, our REO signature could identify 100% of the 33 CRC samples correctly, but the k‐TSP signature identified only 30.3% CRC samples correctly. One possible reason could be that the difference in the proportion of samples from the Affymetrix and RNA‐seq platforms will bias the signature to the platform with larger samples when using the tspair R package and ktspair R package. In the training process for our REO signature, the gene pairs that were consistently detected in the data produced by the 2 platforms were used for the final signature selection (7 gene pairs in this study). Therefore, our method is intuitive and simple with the ability to identify very robust disease signatures.
Some genes in our signature, including AREG, SSBP1, KCNH2, and TRIM40, are well known CRC‐related genes that might play a key role in the development of CRC. For example, AREG could induce the upregulation of EGFR, which is a key mediator of intestinal neoplastic transformation, and high gene expression level of AREG is a favorable prognostics biomarker for metastatic CRC. Another gene, SSBP1, has highly abundant gene expression levels in CRC and is closely related with poor outcomes of CRC patients. In cisplatin‐resistant CRC cells, KCNH2 inhibitors had a synergistic action with cisplatin in triggering apoptosis and inhibiting proliferation. Additionally, TRIM40 might provide therapeutic benefits, not only for inhibition of the growth of gastrointestinal cancers but also for the prevention of chronic IBDs. In addition, ARHGAP10, DHRS7, and ZNF671 have also been reported to be closely correlated with other types of cancer, such as lung and prostate cancer. The above results indicated that the genes of the signature might play important roles in the carcinogenesis of CRC and these functions need to be further studied in future work.
In summary, our signature, consisting of 7 gene pairs, could robustly be applied for aiding the early diagnosis of CRC in multiple datasets of both biopsy and surgically resected samples, which is also suitable for minimum biopsy specimens and inaccurately sampled specimens. The clinical value of the 7 gene pairs for early diagnosis of CRC is worthy of further verification. Moreover, as the cost of high‐throughput sequencing decreases markedly, for a limited amount of precious tissue sample at the clinical scene, we could measure all the genes or a set of genes of different biomarkers for diagnosis, histological classification, prognosis, and drug resistance evaluation of CRC (“a sequencing for all”).
This work was supported by the National Natural Science Foundation of China (grant nos. 81872396, 61801118, and 81572935), the Startup Fund for Scientific Research, Fujian Medical University (grant nos. 2017XQ2002, 2017XQ2007, and 2017XQ2006), the Doctoral Research Foundation of the First Affiliated Hospital of Gannan Medical University, the Joint Scientific and Technology Innovation Fund of Fujian Province (grant no. 2016Y9044), and the Fujian Provincial Finance Department Special Fund (No. (2015)1297).
The authors have no conflict of interest.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2019. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Currently, using biopsy specimens for the early diagnosis of colorectal cancer (CRC) is not entirely reliable due to insufficient sampling amount and inaccurate sampling location. Thus, it is necessary to develop a signature that can accurately identify patients with CRC under these clinical scenarios. Based on the relative expression orderings of genes within individual samples, we developed a qualitative transcriptional signature to discriminate CRC tissues, including CRC adjacent normal tissues from non‐CRC individuals. The signature was validated using multiple microarray and RNA sequencing data from different sources. In the training data, a signature consisting of 7 gene pairs was identified. It was well validated in both biopsy and surgical resection specimens from multiple datasets measured by different platforms. For biopsy specimens, 97.6% of 42 CRC tissues and 94.5% of 163 non‐CRC (normal or inflammatory bowel disease) tissues were correctly classified. For surgically resected specimens, 99.5% of 854 CRC tissues and 96.3% of 81 CRC adjacent normal tissues were correctly identified as CRC. Notably, we additionally measured 33 CRC biopsy specimens by the Affymetrix platform and 13 CRC surgical resection specimens, with different proportions of tumor epithelial cells, ranging from 40% to 100%, by the RNA sequencing platform, and all these samples were correctly identified as CRC. The signature can be used for the early diagnosis of CRC, which is also suitable for minimum biopsy specimens and inaccurately sampled specimens, and thus has potential value for clinical application.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
; Chen, Kui 3 ; Guo, You 1
; Guan, Guoxian 4 ; Guo, Zheng 1
1 Department of Bioinformatics, School of Basic Medical Sciences, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, China; Key Laboratory of Medical Bioinformatics, Fuzhou, China
2 Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
3 Department of General Surgery, Affiliated Fuzhou First Hospital of Fujian Medical University, Fuzhou, China
4 Department of Colorectal Surgery, The Affiliated Union Hospital of Fujian Medical University, Fuzhou, China





