1. Introduction
Metabolomics [1] or metabonomics [2, 3] is an emerging-omics approach using nuclear magnetic resonance (NMR) spectroscopy or gas chromatography/liquid chromatography-mass spectrometry (GC-MS or LC-MS) technologies. It constitutes a field of science that deals with the measurement of metabolite variations in a biological compartment for the study of the physiological processes in response to xenobiotic interventions that is complementary to organ-specific biochemical and histological findings. Through the analysis of one or several kinds of biofluids including serum, urine, saliva, and tissue samples, the global and dynamic alterations in metabolism can be deciphered [4]. Therefore, metabolomics has been increasingly used in many applications such as identifying metabolite markers for clinical diagnosis and prognosis [5], monitoring the chemical-induced toxicity [6], exploring the potential mechanism of diverse diseases [7], and assessing therapeutic effects of treatment modalities [8, 9]. Univariate and/or multivariate statistical methods are routinely used in metabolomics studies, aiming at successful classification of samples with metabolic phenotypic variations and identification of potential biomarkers while minimizing the technical variations.
To date, the most widely used classification methods in metabolomic data processing include principal component analysis (PCA), projection to latent structures (PLS) analysis, support vector machine (SVM), Linear discriminant analysis (LDA), and univariate statistical analysis such as Student's t-test and analysis of variance (ANOVA) test [10, 11]. We recently applied some of these methods in combination to identify metabolite-based biomarkers in hepatocellular carcinoma [5], gastric cardia cancer [12], knee osteoarthritis [13], oral cancer [14], and schizophrenia [7]. Nevertheless, more effective and robust bioinformatics tools are in critical need for metabolomic data analysis especially when dealing with clinical samples with large individual variability due to diverse demographic and genetic background of patients and various pathological conditions or treatments.
A machine learning method, random forest (RF), is reported as an excellent classifier with the following advantages: simple theory, fast speed, stable and insensitive to noise, little or no overfitting, and automatic compensation mechanism on biased sample numbers of groups [15]. RF has been widely used in microarray [16–18] and single nucleotide polymorphism (SNP) [19] data analysis achieving good performance. However, in the field of clinical metabolomic data analysis, it has not got enough attention and concern. In addition, no comprehensive performance evaluation about this classifier is reported.
In this research, RF was used in the analysis of a GC-MS derived clinical metabolomic dataset. Its classification and biomarker selection performances were compared with PLS, LDA, and SVM comprehensively. The score plot based on cross validation was used for classification accuracy evaluation. The cross-validation and ROC (receiver operating characteristic) curve were carried out to test their prediction ability and stability. The
2. Methods
2.1. Metabolomic Data Set
Colorectal cancer (CRC) is one of the common types of cancer and the leading causes of cancer death in the world [20]. Urinary samples of 67 CRC patients (67 preoperation samples and 63 matched postoperation ones) and 65 healthy volunteers were collected from the Cancer Hospital affiliated to Fudan University (Shanghai, China). Patients enrolled in this study were not on any medication before preoperative sample collection. The postoperative samples were collected on the 7 day after surgery. Sample collection protocol was approved by the Cancer Hospital Institutional Review Board and written consents were signed by all participants prior to the study. The healthy volunteers were selected by a routine physical examination, and any subjects with inflammatory conditions or gastrointestinal tract disorders were excluded. Other background information such as diet and alcohol consumption was considered during sample selection to minimize the diet-induced metabolic variations. All the samples were collected in the morning before breakfast, immediately centrifuged, and stored at −80°C until analysis. Clinical characteristics of all the samples in this study are provided in Table 1. All the samples were chemically derivatized and subsequently analyzed by GC-MS following our previously published procedures [21].
Table 1
Sample information.
Data set | CRC | ||
---|---|---|---|
Sample type | urine | ||
Group | Normal | CRC (preoperation) | postoperation |
Number | 65 | 67 | 63 |
Age (Mean (minimum, maximum)) | 55 (38, 74) | 59 (40, 76) | 60 (40, 77) |
Gender (male : female) | 23 : 40 | 35 : 28 | 36 : 24 |
Dimension (Sample × variable) | Case A (Normal versus CRC): 132 × 187 |
||
Case B (Pre versus Post): 130 × 187 |
The acquired MS data were pretreated and processed according to our previously published protocols [5, 7]. A total of 187 variables (areas of peaks, denoting concentrations of metabolites), 35 metabolites were obtained from the spectral data analysis. Normalization (to the total intensity to compensate for the overall variability during sample extraction, injection, detection, and disparity of urine volume), mean centering, and unit variance scaling of the data sets were performed prior to statistical analysis. Finally, the data set contains 187 variables and 195 samples. Two cases: (a) Normal versus CRC patients (preoperative) and (b) Preoperative versus postoperative patients were involved for analysis.
2.2. Random Forest
Random forest (RF), developed by Breiman [22], is a combination of tree-structured predictors (decision trees). Each tree is constructed via a tree classification algorithm and casts a unit vote for the most popular class based on a bootstrap sampling (random sampling with replacement) of the data. The simplest random forest with random features is formed by selecting randomly, at each node, a small group of input variables to split on. The size of the group is fixed throughout the process of growing the forest. Each tree is grown by using the CART (classification and regression tree) methodology without pruning. The tree number of the forest in this study is set to be 200, the number of input variables tried for each node is the square root of the number of total variables, and the minimum size of the terminal nodes is set to be 2. The “score” of RF is the scaled sum of votes derived from the trained trees for out-of-bag samples.
RF includes two methods for measuring the importance of a variable or how much it contributes to predictive accuracy. The default method is the Gini score (the method of this study). For any variable, the measure is the increase in prediction error if the values of that variable are permuted across the out-of-bag observations. This measure is computed for every tree, then averaged over the entire ensemble, and divided by the standard deviation over the entire ensemble. Therefore, the larger the Gini score is (ranges from 1 to 100), the more important a variable is.
Please refer to the appendices for the introduction of other classifiers (PLS, SVM, and LDA).
2.3. Evaluation of Classification Performance
The classification performance of RF as well as PLS, LDA, and SVM can be evaluated and compared using several approaches: cross-validation,
2.3.1. Cross-Validation: Prediction Ability and Stability
Two types of cross-validations: k-fold and
2.3.2. R2/Q2 Plot—Overfitting
Consider
In the equations,
The criteria for classifier validity are as follows. (1) All the
2.3.3. Receiver Operating Characteristic (ROC): Diagnosis Potential
ROC analysis is a classic method from signal detection theory and is now commonly used in clinical research [23]. ROC of a classifier shows its performance as a tradeoff between specificity and sensitivity. Sensitivity is defined as the proportion of subjects with disease whose tests is positive, and calculated by the formula, TruePositive/(TruePositive+FalseNegative). Specificity, on the other hand, is defined as the proportion of subjects without disease whose tests is negative, and calculated in the formula, TrueNegative/(TrueNegative+FalsePositive). Typically, 1-specificity is plotted on the
2.3.4. Variable Number Dependence
Generally, too many irrelevant variables are liable to result in overfitting decisions, whereas differences between groups cannot be extracted and depicted completely if crucial variables are not concerned [24]. Variable number dependence is therefore a necessary factor for classifier performance evaluation.
To avoid bias, it is advisable to rank and eliminate variables one by one. Initially, the whole dataset is taken when a classifier is computed. Then, a list of variables in descending order relative to classification importance is established and the variable in the end is eliminated for subsequent analysis. This process is repeated until only one variable is left for classifier building. The last few variables are of great potential to be biomarkers for separating the groups.
2.4. Evaluation of Biomarker Selection Performance
Prediction ability and stability, overfitting, diagnosis potential, and variable number dependence are important aspects for a classifier. Variable ranking and biomarker selection is of the same importance in metabolomics study.
For RF, variables are ranked by Gini score, a measurement of average accuracy of all trees containing a particular variable [22]. For PLS, the conventional VIP (variable importance in projection) value is used for variable ranking. For LDA, the coefficients of variables in the discriminant function indicate their importance. As to SVM, the importance of variables is evaluated by the SVM recursive feature elimination (SVM-RFE) algorithm [25].
As each classifier possesses its own algorithm for variable importance ranking with its own strength and weakness, the Pearson correlation coefficient of every two ranks was used to evaluate their consistency and the rank of t-test (by ascending order of variable
All the metabolites were identified and verified by public libraries such as HMDB, KEGG, and/or reference standards available in our laboratory.
All the classifiers andevaluation methods were carried out using Matlab toolbox (Version 2009a, Mathworks).
3. Results and Discussion
3.1. Classification Performance
RF as well as PLS, LDA, and SVM were applied on the dataset for the two comparative cases (Figures 1(a) and 1(b)). In Figure 1(a), red and blue dots represent the normal and CRC patients, respectively.
[figures omitted; refer to PDF]
3.2. Prediction Ability, Stability, Overfitting, and Diagnostic Ability Evaluation
The accuracy of classification is crucial for a classifier, while other classification behaviors such as prediction ability, stability, degree of overfitting and diagnostic ability are of equal significance as well.
The holdout cross-validation results (33% holdout samples, 100 times) of RF (purple), PLS (blue), LDA (red), and SVM (green) on the two cases are presented as box plots (Figure 2). The
Table 2
Averaged error rates and their standard deviations of RF, PLS, LDA, and SVM on 2 cases by 7- and 10-fold cross-validation as well as 10% and 15% hold out cross-validation (100 times).
Case | Evaluation item | RF error rate | PLS error rate | SVM error rate | LDA error rate | ||||
---|---|---|---|---|---|---|---|---|---|
mean (S.D.) | mean (S.D.) | mean (S.D.) | mean (S.D.) | ||||||
(A) Normal versus CRC | 7-fold CV | 0.071 |
|
0.134 |
|
0.148 |
|
0.227 |
|
10-fold CV | 0.069 |
|
0.094 |
|
0.126 |
|
0.188 |
|
|
15% holdout CV | 0.065 |
|
0.132 |
|
0.117 |
|
0.189 |
|
|
10% holdout CV | 0.065 |
|
0.121 |
|
0.113 |
|
0.181 |
|
|
|
|||||||||
(B) Pre versus post | 7-fold CV | 0.102 |
|
0.130 |
|
0.170 |
|
0.108 |
|
10-fold CV | 0.096 |
|
0.169 |
|
0.163 |
|
0.096 |
|
|
15% holdout CV | 0.088 |
|
0.137 |
|
0.186 |
|
0.127 |
|
|
10% holdout CV | 0.083 |
|
0.145 |
|
0.161 |
|
0.114 |
|
[figures omitted; refer to PDF]
Figures 3(a) and 3(b) display the correlation between the actual
[figures omitted; refer to PDF]
The ROC curve coupled with its area under the curve (AUC) is a common method used to estimate the diagnosis potential of a classifier in clinical applications. A larger AUC indicates higher prediction ability. The ROC curves and AUC values of all the classifiers in the two cases are plotted in Figure 4. RF outperforms the others once more with the greatest AUC values (
3.3. Variable Number Dependence
Figures 5(a) and 5(b) show the classification error rates (
[figures omitted; refer to PDF]
3.4. Putative Biomarker Selection
Variable number dependence section is to evaluate whether and how much the performance of RF depends on the number of variables involved. This section is to evaluate its capability on important variable (putative biomarker) selection. The Pearson correlation matrixes of ranks from every two classifiers (including t-Test) based on all variables (A-B) and identified metabolites (C-D) in the two cases are listed in Table 3. On the whole, RF, PLS and t-Test have good consistency with each other (high Pearson correlation coefficients) regardless of whether all variables (Figure 6(a)) or identified metabolites (Figure 6(b)) are involved.
Table 3
Pearson correlation coefficient matrixes of rank lists by t-test, PLS, SVM, and RF in 2 cases based on all variables (A-B) and identified metabolites (C-D).
Method | tRanka | PLSRankb | RFRankc | SVMRankd | LDARanke |
---|---|---|---|---|---|
Pearson correlation coefficient matrix based on all variables | |||||
|
|||||
(A) Normal versus CRC | |||||
tRank | 1.000 | 0.794f | 0.575 | 0.327 | 0.342 |
PLSRank | 0.794 | 1.000 | 0.574 | 0.328 | 0.342 |
RFRank | 0.575 | 0.574 | 1.000 | 0.210 | 0.256 |
SVMRank | 0.327 | 0.328 | 0.210 | 1.000 | 0.167 |
LDARank | 0.342 | 0.342 | 0.256 | 0.167 | 1.000 |
(B) Pre versus post | |||||
tRank | 1.000 | 0.232 | 0.217 | 0.021 | 0.032 |
PLSRank | 0.232 | 1.000 | 0.652 | 0.066 | 0.066 |
RFRank | 0.217 | 0.652 | 1.000 | 0.086 | 0.057 |
SVMRank | 0.021 | 0.066 | 0.086 | 1.000 | 0.007 |
LDARank | 0.032 | 0.066 | 0.057 | 0.007 | 1.000 |
|
|||||
Pearson correlation coefficient matrix based on identified metabolites | |||||
|
|||||
(C) Normal versus CRC | |||||
tRank | 1.000 | 0.753 | 0.754 | 0.364 | 0.340 |
PLSRank | 0.753 | 1.000 | 0.756 | 0.267 | 0.340 |
RFRank | 0.754 | 0.756 | 1.000 | 0.495 | 0.308 |
SVMRank | 0.364 | 0.267 | 0.495 | 1.000 | 0.190 |
LDARank | 0.340 | 0.340 | 0.308 | 0.190 | 1.000 |
(D) Pre versus post | |||||
tRank | 1.000 | 0.272 | 0.258 | 0.194 | 0.187 |
PLSRank | 0.272 | 1.000 | 0.733 | 0.048 | 0.044 |
RFRank | 0.258 | 0.733 | 1.000 | 0.034 | 0.041 |
SVMRank | 0.194 | 0.048 | 0.034 | 1.000 | 0.187 |
LDARank | 0.187 | 0.044 | 0.041 | 0.187 | 1.000 |
bvariable rank by the VIP value of PLS.
cvariable rank by the Gini value of RF.
dvariable rank by the SVM-REF.
evariable rank by the LDA coefficient.
fPearson correlation coefficient of PLS and
[figures omitted; refer to PDF]
Interestingly, in Table 3, the highest and second highest correlation coefficients are 0.794 for PLS and t-Test (case A) and 0.756 for RF and PLS (case B) indicating the consistency and mutual complementarity of classifiers. All the important metabolites selected by both t-Test (
[figures omitted; refer to PDF]
4. Conclusion
In this study, RF was applied successfully in metabolomic data analysis for clinical phenotype discrimination and biomarker selection. Its various performances were evaluated and compared with the other three classifiers PLS, SVM, and LDA by two types of cross-validations,
The combinational usage of multiple methods, RF, t-Test, and PLS, for example, may provide more comprehensive information for a “global” understanding of the metabolomics or other “omics” data.
Acknowledgment
This work was financially supported by the National Basic Research Program of China (2007CB914700), National Natural Science Foundation of China Program (81170760), and the Natural Science Foundation of Shanghai, China (10ZR1414800).
Glossary
Abbreviations
GC-MS:Gas chromatography mass spectrometry
RF:Random forest
LDA:Linear discriminant analysis
SVM:Support vector machine
PCA:Principal component analysis
PLS:Projection to latent structures
ROC:Receiver operating characteristic
CRC:Colorectal cancer
NMR:Nuclear magnetic resonance
MS:Mass spectrometry.
Appendices
A. Projection to Latent Structures (PLS)
The basic object of PLS is to find the linear (or polynomial) relationship between the superior variable
(a)
To well approximate the
(b)
To maximize the correlation between
The PLS model accomplishing these objectives can be expressed as
The model will iteratively compute one component at a time, that is: one vector derived from
The formula to calculate
The
B. Support Vector Machine (SVM)
The key to the success of SVM is the kernel function which maps the data from the original space into a high dimensional (possibly infinite dimensional) feature space. By constructing a linear boundary in the feature space, the SVM produces nonlinear boundaries in the original space. Given a training sample, a maximum-margin hyper plane splits a given training sample in such a way that the distance from the closest cases (support vectors) to the hyper plane (
SVM Recursive Feature elimination (SVM-RFE) is a wrapper approach which uses the norm of the weights
Linear kernel was used for SVM classification and feature selection. This kernel was chosen to reduce the computational complexity and eliminate the need for retuning kernel parameters for every new subset of variables. Another important advantage of choosing a linear kernel is that the norm of the weight
C. LDA
LDA adopts a linear combination of variables (
Different with PLS which look for linear combinations of variables to best explain both the data set
The “score” of
[1] O. Fiehn, "Metabolomics—the link between genotypes and phenotypes," Plant Molecular Biology, vol. 48 no. 1-2, pp. 155-171, DOI: 10.1023/A:1013713905833, 2002.
[2] J. K. Nicholson, J. Connelly, J. C. Lindon, E. Holmes, "Metabonomics: a platform for studying drug toxicity and gene function," Nature Reviews Drug Discovery, vol. 1 no. 2, pp. 153-161, DOI: 10.1038/nrd728, 2002.
[3] J. K. Nicholson, J. C. Lindon, E. Holmes, "‘Metabonomics’: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data," Xenobiotica, vol. 29 no. 11, pp. 1181-1189, 1999.
[4] J. Schnabel, "Targeting tumour metabolism," Nature Reviews Drug Discovery, vol. 9, pp. 503-504, DOI: 10.1038/nrd3215, 2010.
[5] T. L. Chen, G. X. Xie, X. Y. Wang, "Serum and urinemetabolite profiling reveals potential biomarkers of human hepatocellular carcinoma," Molecular & Cellular Proteomics, vol. 10, 2011.
[6] G. X. Xie, X. J. Zheng, X. Qi, "Metabonomic evaluation of melamine-induced acute renal toxicity in rats," Journal of Proteome Research, vol. 9 no. 1, pp. 125-133, DOI: 10.1021/pr900333h, 2010.
[7] J. Yang, T. Chen, L. Sun, "Potential metabolite markers of schizophrenia," Molecular Psychiatry, vol. 18 no. 1, pp. 67-78, 2013.
[8] Y. Q. Bao, T. Zhao, X. Y. Wang, "Metabonomic variations in the drug-treated type 2 diabetes mellitus patients and healthy volunteers," Journal of Proteome Research, vol. 8 no. 4, pp. 1623-1630, DOI: 10.1021/pr800643w, 2009.
[9] X. Wang, J. Lin, T. Chen, M. Zhou, M. Su, W. Jia, "Metabolic profiling reveals the protective effect of diammonium glycyrrhizinate on acute hepatic injury induced by carbon tetrachloride," Metabolomics, vol. 7 no. 2, pp. 226-236, DOI: 10.1007/s11306-010-0244-5, 2010.
[10] J. Trygg, E. Holmes, T. Lundstedt, "Chemometrics in metabonomics," Journal of Proteome Research, vol. 6 no. 2, pp. 469-479, DOI: 10.1021/pr060594q, 2007.
[11] H. W. Cho, S. B. Kim, M. K. Jeong, "Discovery of metabolite features for the modelling and analysis of high-resolution NMR spectra," International Journal of Data Mining and Bioinformatics, vol. 2 no. 2, pp. 176-192, DOI: 10.1504/IJDMB.2008.019097, 2008.
[12] Z. Cai, J. Zhao, X. Wang, "A combined proteomics and metabolomics profiling of gastric cardia cancer reveals characteristic dysregulations in glucose metabolism," Molecular & Cellular Proteomics, vol. 9, pp. 2617-2628, DOI: 10.1074/mcp.M110.000661, 2010.
[13] X. Li, S. Yang, Y. Qiu, "Urinary metabolomics as a potentially novel diagnostic and stratification tool for knee osteoarthritis," Metabolomics, vol. 6 no. 1, pp. 109-118, DOI: 10.1007/s11306-009-0184-0, 2010.
[14] J. Wei, G. X. Xie, Z. T. Zhou, "Salivary metabolite signatures of oral cancer and leukoplakia," International Journal of Cancer, vol. 129 no. 9, pp. 2207-2217, 2011.
[15] D. Amaratunga, J. Cabrera, Y. S. Lee, "Enriched random forests," Bioinformatics, vol. 24 no. 18, pp. 2010-2014, DOI: 10.1093/bioinformatics/btn356, 2008.
[16] A. Statnikov, L. Wang, C. F. Aliferis, "A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification," BMC Bioinformatics, vol. 9, pp. 319-328, DOI: 10.1186/1471-2105-9-319, 2008.
[17] X. Y. Wu, Z. Y. Wu, K. Li, "Identification of differential gene expression for microarray data using recursive random forest," Chinese Medical Journal, vol. 121 no. 24, pp. 2492-2496, 2008.
[18] A. Acharjeea, B. Kloosterman, R. C. H. D. Vos, "Data integration and network reconstruction with ~ omics data using Random Forest regression in potato," Analytica Chimica Acta, vol. 705 no. 1-2, pp. 56-63, DOI: 10.1016/j.aca.2011.03.050, 2011.
[19] R. Jiang, W. W. Tang, X. B. Wu, W. H. Fu, "A random forest approach to the detection of epistatic interactions in case-control studies," BMC Bioinformatics, vol. 10, pp. 65-76, DOI: 10.1186/1471-2105-10-S1-S65, 2009.
[20] A. Jemal, R. Siegel, J. Xu, E. Ward, "Cancer statistics, 2010," CA Cancer Journal for Clinicians, vol. 60 no. 5, pp. 277-300, DOI: 10.3322/caac.20073, 2010.
[21] Y. Qiu, M. Su, Y. Liu, "Application of ethyl chloroformate derivatization for gas chromatography-mass spectrometry based metabonomic profiling," Analytica Chimica Acta, vol. 583 no. 2, pp. 277-283, DOI: 10.1016/j.aca.2006.10.025, 2007.
[22] L. Breiman, "Random forests," Machine Learning, vol. 45 no. 1,DOI: 10.1023/A:1010933404324, 2001.
[23] A. P. Bradley, "The use of the area under the ROC curve in the evaluation of machine learning algorithms," Pattern Recognition, vol. 30 no. 7, pp. 1145-1159, DOI: 10.1016/S0031-3203(96)00142-2, 1997.
[24] K. Duan, S. S. Keerthi, A. N. Poo, "Evaluation of simple performance measures for tuning SVM hyperparameters," Neurocomputing, vol. 51, pp. 41-59, DOI: 10.1016/S0925-2312(02)00601-X, 2003.
[25] I. Guyon, J. Weston, S. Barnhill, V. Vapnik, "Gene selection for cancer classification using support vector machines," Machine Learning, vol. 46 no. 1–3, pp. 389-422, DOI: 10.1023/A:1012487302797, 2002.
[26] Y. P. Qiu, G. X. Cai, M. M. Su, "Serum metabolite profiling of human colorectal cancer using GC−TOFMS and UPLC−QTOFMS," Journal of Proteome Research, vol. 8 no. 10, pp. 4844-4850, DOI: 10.1021/pr9004162, 2009.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2013 Tianlu Chen et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/
Abstract
Metabolomic data analysis becomes increasingly challenging when dealing with clinical samples with diverse demographic and genetic backgrounds and various pathological conditions or treatments. Although many classification tools, such as projection to latent structures (PLS), support vector machine (SVM), linear discriminant analysis (LDA), and random forest (RF), have been successfully used in metabolomics, their performance including strengths and limitations in clinical data analysis has not been clear to researchers due to the lack of systematic evaluation of these tools. In this paper we comparatively evaluated the four classifiers, PLS, SVM, LDA, and RF, in the analysis of clinical metabolomic data derived from gas chromatography mass spectrometry platform of healthy subjects and patients diagnosed with colorectal cancer, where cross-validation,
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer