INTRODUCTION
Breast cancer (BC) is the most common malignant tumor and the primary cause of tumor mortality in women globally. The morbidity has also been increasing annually in patients with BC. The new cases of female BC are estimated to be 2.3 million according to GLOBOCAN 2020; thus, BC has surpassed lung cancer and become the most common cancer in women. Owing to dietary regimens, lifestyle changes, and the natural environment, the morbidity and mortality rates of BC have increased in China recent years.
Most of breast cancer-related deaths are associated with metastasis that over 90% are attributed to metastasis-related complications. Metastatic BC remains incurable despite improvements in early detection and advances in treatment because metastatic BC is refractory to almost all current treatments and most of treatments are not curative but just merely palliative. On one hand, identification of BC metastasis risk could inform approaches to early detection and prevention by additional interventions. On another hand it is critical to accurately predict metastasis for precision medicine and individualized therapy, thus avoiding the need for toxic and costly therapies. However, inappropriate screening examinations or the overuse of diagnostic tests has increased healthcare costs. Conventional predictors for BC metastases, such as histological grade and lymph node status, are limited in their ability to accurately predict metastases corresponding to clinical symptoms. Therefore, conventional prognostic criteria cannot predict the metastasis risk accurately in patients with BC. Consequently, many patients unnecessarily receive cytotoxic chemotherapy. Accurate prediction of BC metastasis risks could help to reduce the public health and social burdens of breast cancer. As the gene technology developed, researchers have applied the integration of multiple genetic and molecular markers to develop newer models to predict the prognosis of breast cancer patients. Some multigene expression assays, such as the 21-gene recurrence score assay and the 70-gene expression profile, have been developed to identify patients with a high risk of developing to recurrence or metastasis and who would thus benefit from chemotherapy. However, gene detection requires cutting-edge technology, making it expensive. Furthermore, its utility has only been established in a certain patient subset; thus, it is not widely used.
Recently, researchers have developed several prediction models based on pathological and clinical factors. PREDICT and Adjuvant! Online are commonly applied and can calculate individualized survival probabilities based on integrating clinical variables (tumor size, nodal involvement, histologic grade, hormone receptor status, and age) into a multivariable statistical analysis relying on agnostic survival analysis statistical tools, such as Cox regression. Cox regression is generally employed to identify predictors but involves restrictive assumptions, such as proportionality of hazards and linearity, which may introduce bias into the prognostic analysis of BC patients during long-term follow-up and hinder the identification of prognostic markers. So, a simple and accurate predictive model with high clinical applicability and generalizability is needed to predict BC metastasis urgently.
The machine learning methods can construct predictive models that can evaluate numerous variables efficiently, overcoming the disadvantage of conventional models. Random survival forest (RSF), developed from random forest and survival analysis, is a machine-learning method that has no restrictions on the data distribution, making it a non-parametric method and can be applied to analyze data with a significantly larger number of variables than the sample size. Moreover, this method can avoid potential overfitting and collinearity effectively. Further, there are no special requirements for the data type or the association between outcomes and predictive variables, and it is not constrained by logarithmic linear assumption or proportional risk assumption. We employed the RSF method based on baseline clinical parameters, including general information of patients, pathological examinations, and blood tests to develop a new predictive model for BC metastasis occurrence.
PATIENTS AND METHODS
Patients and study design
We retrospectively investigated the medical records of BC patients two independent institutions from January 2013 to December 2020. Patients with stage 0 to III primary BC were enrolled and all enrolled patients accepted primary BC treatment. Patients with stage IV breast cancer, with other synchronous malignancies, with other cancer history, with incomplete information (lacking >50% parameters) or lost to follow-up were excluded. Patients from the third affiliated hospital of Sun Yat-sen University were assign to the training set to develop the model, and patients from Liuzhou women and children's medical center were assign to the validation set validate the model. Figure presented the flowchart of the study design and patient selection.
[IMAGE OMITTED. SEE PDF]
Statistical analyses
The baseline demographic and clinical characteristics of patients were presented as percentages or means with standard deviations. Continuous data were evaluated using the Student's t-test or the Mann–Whitney U test, and categorical data were evaluated using the Chi-squared test. All statistical analyses were performed using R software (R Foundation for Statistical Computing, Vienna, Austria). All analyses were two-tailed, and differences were statistically significant at p < .05.
Data preprocessing
Missing data in the training and validation sets were interpolated by the “mice” package in R (predictive mean matching [PMM]).
Predictor selection
Based on the clinical date, random forest-recursive feature elimination (RF-RFE) (run by the “caret” package in R) was applied to select the best variable set (a positive variable importance [VIMP] value calculated by RF-RFE indicates that one variable improves predictive accuracy, while a negative value indicates an adverse effect in the prediction).
Model development and evaluation
The RSF method was applied to develop a model to predict BC metastasis risk. To evaluate the accuracy of the model, we took the root mean square error (RMSE) that occurred mathematically between the test and predicted values. The higher the prediction accuracy, the lower the RMSE. All pairs of mtry and ntree were developed by a grid search employing 10-fold cross-validation, and those with the best concordance index (C-index) were determined as optimized parameters. We used the C-index to assess the discrimination of the predictive model (0.5–0.7 represents weak discrimination power, 0.7–0.9 represents moderate discrimination power, and >0.9 represents strong discrimination power). We used the Brier scores to evaluate the calibration of the model. The Brier score measures the calibration of the model by taking the mean squared error between the predicted probabilities and the observed outcomes. It ranges from 0 to 1, a lower score indicating higher accuracy. Brier scores <0.25 showed relatively good calibration from automated modeling. A receiver operating characteristic (ROC) curve and Kaplan–Meier (KM) survival analysis were applied to assess the precision of the predictive model. BC metastasis is the end-point event.
RESULTS
We enrolled 774 patients, 623 patients were included in the training and 151 patients were included in validation sets for model development and model validation, respectively. Forty-one variables were included and 22 variables needed interpolation in training set and 11 variables needed interpolation in validation set. All missing data with missing rate less than 20%. Baseline characteristic of the training and validation sets is present in Table .
TABLE 1 Basic information of the training and validation sets.
Total set | Training set | Validation set | p value | |
Total (n) | 774 | 623 | 151 | - |
Female sex (n) | 774 | 623 | 151 | - |
Age (years) | 49.76 ± 10.83 | 50.50 ± 10.84 | 46.74 ± 10.27 | .001 |
≥60 (n) | 155 (20.03%) | 138 (22.15%) | 17 (11.26%) | .011 |
<60 (n) | 619 (79.97%) | 485 (77.85%) | 134 (88.74%) | |
≥35 (n) | 713 (92.12%) | 577 (92.62%) | 136 (90.07%) | .569 |
<35 (n) | 61 (7.88%) | 46 (7.38%) | 15 (9.93%) | |
BMI (kg/m2) | 23.19 ± 3.18 | 23.69 ± 3.36 | 21.49 ± 1.54 | <.001 |
ALT (U/L) | 20.24 ± 12.94 | 19.52 ± 13.84 | 23.23 ± 7.59 | <.001 |
AST (U/L) | 21.0 ± 10.02 | 21.34 ± 11.07 | 19.62 ± 2.59 | .167 |
TBIL (μmol/L) | 11.12 ± 4.82 | 10.92 ± 4.34 | 11.96 ± 6.34 | .312 |
DBIL (μmol/L) | 3.31 ± 1.45 | 3.18 ± 1.46 | 3.81 ± 1.26 | <.001 |
GGT (U/L) | 25.40 ± 17.80 | 26.81 ± 8.62 | 25.03 ± 19.46 | <.001 |
ALP (U/L) | 64.84 ± 20.94 | 65.14 ± 22.63 | 63.66 ± 12.14 | .738 |
ALB (g/L) | 42.63 ± 4.37 | 42.59 ± 3.58 | 42.80 ± 6.72 | .875 |
GLB (g/L) | 27.30 ± 4.21 | 27.15 ± 3.99 | 27.88 ± 5.02 | .169 |
A/G | 1.60 ± 0.28 | 1.60 ± 0.25 | 1.58 ± 0.37 | .821 |
Cr (μmol/L) | 65.13 ± 17.03 | 59.88 ± 12.19 | 86.71 ± 17.15 | <.001 |
GLU | 5.40 ± 1.73 | 5.47 ± 1.41 | 5.13 ± 2.67 | <.001 |
UA | 313.78 ± 80.62 | 315.80 ± 89.10 | 305.96 ± 29.50 | .409 |
CHOL | 4.98 ± 0.94 | 4.94 ± 1.04 | 5.12 ± 0.36 | .112 |
TRIG | 1.24 ± 0.88 | 1.35 ± 0.95 | 0.81 ± 0.34 | <.001 |
HDL | 1.36 ± 0.45 | 1.30 ± 0.32 | 1.59 ± 0.73 | <.001 |
LDL | 3.06 ± 0.87 | 3.11 ± 0.90 | 2.88 ± 0.76 | .017 |
ApoA | 1.47 ± 0.22 | 1.43 ± 0.22 | 1.62 ± 0.09 | <.001 |
ApoB | 1.01 ± 0.27 | 1.03 ± 0.30 | 0.94 ± 0.07 | .005 |
Lpa | 198.07 ± 217.58 | 210.55 ± 241.54 | 149.30 ± 38.15 | .009 |
WBC (×109/L) | 5.88 ± 1.81 | 6.28 ± 1.65 | 4.25 ± 1.46 | <.001 |
NEUT (×109/L) | 3.69 ± 1.40 | 3.91 ± 1.42 | 2.75 ± 0.81 | <.001 |
LYMPH (×109/L) | 1.77 ± 0.58 | 1.82 ± 0.6 | 1.53 ± 0.41 | <.001 |
RBC (×1012/L) | 4.42 ± 0.52 | 4.43 ± 0.55 | 4.37 ± 0.35 | .449 |
HCT | 0.37 ± 0.04 | 0.38 ± 0.04 | 0.35 ± 0.02 | <.001 |
Hb (g/L) | 124.94 ± 12.18 | 125.60 ± 13.25 | 122.22 ± 5.15 | .009 |
PLT (×109/L) | 239.73 ± 65.38 | 253.51 ± 63.0 | 182.87 ± 39.08 | <.001 |
AST/PLT | 0.09 ± 0.11 | 0.09 ± 0.12 | 0.11 ± 0.02 | .268 |
NLR | 2.29 ± 1.27 | 2.39 ± 1.37 | 1.85 ± 0.59 | <.001 |
PLR | 146.78 ± 57.36 | 152.87 ± 61.37 | 121.68 ± 23.51 | <.001 |
PT (s) | 12.86 ± 0.71 | 12.95 ± 0.74 | 12.51 ± 0.36 | <.001 |
INR | 0.99 ± 0.36 | 1.00 ± 0.39 | 0.95 ± 0.03 | .799 |
Follow-up (months) | 55.59 ± 25.43 | 60.24 ± 25.68 | 36.42 ± 11.77 | <.001 |
Tumor pathology | ||||
Tumor stage | .569 | |||
0 | 29 (3.75%) | 27 (4.33%) | 2 (1.32%) | |
I | 191 (24.68%) | 117 (18.78%) | 74 (49.01%) | |
II | 382 (49.35%) | 330 (52.97%) | 52 (34.44%) | |
III | 172 (22.22%) | 149 (23.92%) | 23 (15.23%) | |
Histology | .569 | |||
Invasive ductal carcinoma | 650 (83.98%) | 540 (86.68%) | 110 (72.85%) | |
Invasive lobular carcinoma | 35 (4.52%) | 26 (4.17%) | 9 (5.96%) | |
Carcinoma in situ | 51 (6.59%) | 32 (5.14%) | 19 (12.58%) | |
Special types (inflammatory breast cancer, Paget's disease, mucinous carcinoma, malignant phyllodes tumor) | 38 (4.91%) | 25 (4.01%) | 13 (8.61%) | |
Immunohistochemistry | ||||
ER status | .005 | |||
Negative | 150 (19.38%) | 135 (21.67%) | 15 (9.93%) | |
Positive | 624 (80.62%) | 488 (78.33%) | 136 (90.07%) | |
PR | .027 | |||
Negative | 188 (24.29%) | 164 (26.32%) | 24 (15.89%) | |
Positive | 586 (75.71%) | 459 (73.68%) | 127 (84.11%) | |
HER2 status | .001 | |||
Negative | 543 (70.16%) | 418 (67.09%) | 125 (82.78%) | |
Positive | 231 (29.84%) | 205 (32.91%) | 26 (17.22%) | |
Ki-67 | .028 | |||
<14% | 257 (33.20%) | 193 (30.98%) | 64 (42.38%) | |
≥15% | 517 (66.80%) | 430 (69.02%) | 87 (57.62%) | |
Axillary lymph node metastasis | .023 | |||
No | 446 (57.62%) | 344 (55.22%) | 102 (67.55%) | |
Yes | 328 (42.38%) | 279 (44.78%) | 49 (32.45%) | |
Molecular type | .023 | |||
Luminal A | 203 (26.23%) | 162 (26.0%) | 41 (27.15%) | |
Luminal B | 414 (53.49%) | 343 (55.06%) | 71 (47.02%) | |
HER2 enriched | 64 (8.27%) | 54 (8.67%) | 10 (6.62%) | |
TNBC | 80 (10.34%) | 51 (8.19%) | 29 (19.21%) | |
Metastasis | .938 | |||
No | 688 (88.89%) | 553 (88.76%) | 135 (89.40%) | |
Yes | 86 (11.11%) | 70 (11.24%) | 16 (10.6%) | |
Metastatic time (months) | 53.47 ± 25.11 | 58.04 ± 25.45 | 34.74 ± 11.08 | <.001 |
The RF-RFE run using the R “caret” package was applied to filter the most predictive set of variables, and the optimal number of variable sets was selected according to RMSE. Figure shows that the RMSE value was the lowest when there were three variables; thus, the three-variable set was the most predictive variable.
[IMAGE OMITTED. SEE PDF]
The RF-RFE algorithm automatically reviewed the general information of the patients, pathological examinations, and blood tests during treatment to select the most relevant features for further RSF model development. The best three variables filtered by RF-RFE comprised pathological (TNM) stage, aspartate aminotransferase (AST), and neutrophil count. The VIMP values calculated by RF-RFE are present in Figure . However, no variables from general information were selected by the RF-RFE algorithm.
[IMAGE OMITTED. SEE PDF]
RSF with the “RandomForestSRC” package in R was applied to develop the model. The error rate of the model gradually stabilizes with the increase in the number of fixed trees (Figure ). Between 4000 and 6000, the out-of-bag error rate decreases steadily, reaching close to 0.3, and when the fix trees were 10 000, the error rate is significantly stable. Thus, it is sufficient and reasonable to select 10 000 trees (ntree = 10 000), and the best predictive variables (ntree = 10 000, mtry = 4) were chosen for the development of the RSF predictive model. And then RSF-based scores of individual were calculated. The C-index was 0.959 (95% confidence interval [CI], 0.918–0.999) when the Brier score was 0.113.
[IMAGE OMITTED. SEE PDF]
Receiver operating characteristic (ROC) curve analysis was applied to assess the performance of this RSF predictive model in the training set. Based on the RSF scores, the area under the ROC curve (AUROC) was 0.932 (95% CI, 0.911–0.953), with a sensitivity of 100%, a specificity of 74.3%, and an optimal cutoff value of 2.84 in the training set (Figure ).
[IMAGE OMITTED. SEE PDF]
The enrolled patients were divided into a high-risk group with RSF-based scores above the optimal cutoff value of 2.84 and a low-risk group with RSF-based scores below the optimal cutoff value of 2.84. The Kaplan–Meier analyses exhibited significantly different in time to metastasis-free survival between the high- and low-risk groups indicating that patients with higher prediction scores are more vulnerable to BC metastasis (p < .0001) (Figure ).
[IMAGE OMITTED. SEE PDF]
The results indicated that this RSF predictive model could accurately predict the metastasis in breast cancer patients.
The predictive performance of this model was evaluated using the validation cohort. The C-index was 0.917 (95% CI, 0.856–0.978) when the Brier score was 0.097. The AUROC achieved 0.905 (95% CI, 0.849–0.961), with a sensitivity of 100% and a specificity of 97.0% (Figure ).
[IMAGE OMITTED. SEE PDF]
Depending on the optimal cutoff value of the RSF-based score in the training set, patients in the validation set were divided into high-risk group and low-risk group. The Kaplan–Meier analyses demonstrated significantly different in time to metastasis-free survival between the high-risk group and low-risk group (p < .001) (Figure ) which validated the good predictive ability of this model.
[IMAGE OMITTED. SEE PDF]
DISCUSSION
The RF-RFE algorithm, a machine learning method, was applied to automatically select the most important predictive variables to further RSF model building. Variable selection is the process of selecting a data set of predictive variables for further analysis to minimize possible generalization error.
The three best variables for this model included TNM stage, AST level, and neutrophil count. These were all from blood tests and pathological examinations, and no variable based on general information of patients was selected. However, previous studies have reported that TNM stage and neutrophil count were closely related to the prognosis of BC, and we used them to build a reliable model. The most important variable was TNM stage, which is applied widely in clinical practice to predict survival and prognosis and guide clinical decision-making. The enzyme AST is abundantly present in hepatocytes and skeletal, cardiac, and smooth muscles and is released into the bloodstream in hepatitis, myocardial infarction, or myositis. High AST levels are independently associated with the prognosis of both hepatic tumor metastases and metastases from a primary hepatic source. Previous studies have indicated that high AST levels may be associated with aggressive tumor biology or could be explained as a more aggressive tumor caused by high tumor cell turnover and tissue damage. To our knowledge, this was the first study to use AST level in predicting BC metastasis. The neutrophil count is a routine blood test in clinical practice. It was reported that the change in white blood cell (WBC) count in the peripheral blood is associated with systemic inflammatory response. Further, the tumor-related systemic inflammatory response has been proven to be an independent predictor of tumor prognosis. The neutrophil count can reliably reflect the inflammatory status of the body, and the classification of WBC count in the peripheral blood can be used to predict BC prognosis.
This predictive model was established employing both patient and tumor characteristics. This predictive model had good performance in the field of validity and reliability even under external validation on an independent cohort. For the training and external validation cohorts, the C-indexes achieved 0.959 and 0.917, respectively, showing good discrimination. The C-indexes of previous developed predictive models ranged from 0.65 to 0.71, which means that this model was more accurate when compared to previous models. Kaplan–Meier analyses were applied to assess the performance of this model and results indicated that our model had a good performance in predicting BC metastasis (p < .0001 in both the training and validation sets). Moreover, the AUROCs were 0.932 and 0.905 in the training and validation sets, respectively, which means that this model had a good predictive effect on BC metastasis. Furthermore, the AUROCs of previous developed predictive models ranged from 0.58 to 0.90, which means that this model was more accurate when compared to previous models. Furthermore, this model had good Brier scores of 0.113 and 0.097 for the training and external validation sets, respectively, showing good calibration.
This model based on routine demographic and clinical examination data in real-time clinical practice and exhibits a high accuracy of prediction without increasing the medical expense and that was different from developed predictive models relied on new molecular biomarkers derived from gene or protein expression analysis. Considering that new molecular biomarkers are not tested routinely in clinical practice, the medical expense of identifying and exploying routine laboratory parameters is lower than that of employing new molecular biomarkers. Thus, this leads to additional patient expenditures and is not covered by insurance. We do not mean to negate the possible benefits of personalized care based on novel biomarkers but not all breast cancer patients can undergo the test of a novel biomarker and not all regions can perform the test for a novel biomarker. Thus, we need a practically simple and economically viable model to predict BC metastasis. The model established and verified in our study incorporated a comprehensive selected feature for both patient-related features and tumor features to provide an easy-to-operate and individualized prediction of metastasis in BC patients without additional cost. This model can help clinicians stratify cases into the high and low risk of early-stage metastasis. Thus, BC patients at low risk of metastasis can avoid the need for toxic and costly therapies, while BC patients at high risk of metastasis can undergo a more aggressive system therapy such as a more intense chemotherapy or aggressive targeted therapy for every Her-2 positive patient and a more intense follow-up scheme. Moreover, prediction of BC metastasis risks may better manage patient and caregiver expectations, help patients decide which therapies to choose, and even improve the patient compliance and patient care. Besides, considering life expectancy and competing risks of mortality, there is a risk of overtreatment of breast cancer in older individuals. Predicted survival benefits, disease progress risk, effect on anticancer therapy toxicity, life expectancy, quality of life, and patient preferences should be considered carefully when making decision for older BC patients. The treatment decision making of breast cancer in older individuals should involve geriatric assessment and survival estimates. Our model can provide the BC metastasis risk to contribute to make decision in the therapy of older BC patients. Over all, our model can help clinicians to provide more targeted and more accurate individualized therapy to improve the prognosis of breast cancer patients.
There were some limitations in this study. First, all enrolled patients were Han descent. Thus, this study lacks validation for other races. Therefore, validation of these results in other regions and races is needed in the future. Second, this was a retrospective cross-sectional study performed at two centers, and the number of enrolled patients was small. In the future, studies conducted at multiple centers with larger cohorts and longer observation periods are required. Third, AST level and neutrophil count can be affected by a diverse range of factors. Our enrolled patients did not have a high elevated AST level (>5 ULN) and abnormal neutrophil count; hence, further studies are needed to verify whether this model is valid for patients with significantly abnormal AST level and neutrophil count. Fourth, not all of the breast cancer patients enrolled underwent the detection of novel biomarker. Hence, we only compared the C-index and AUROCs between our model and other model, but there was no direct comparison of our data-set to other models.
CONCLUSIONS
This study developed and validated a model to predict metastasis in BC patients from China. The predictive parameters were selected from routine used data in real-time clinical practice without adding medical expense. This machine learning method based model can predict metastasis in BC patients accurately with good discrimination and calibration. Clinicians can provide precise and efficient individualized therapy for patients with BC by using this model so as to improve the prognosis of breast cancer.
AUTHOR CONTRIBUTIONS
Huan Li: Conceptualization (equal); funding acquisition (equal); investigation (equal); methodology (equal); supervision (equal); validation (equal). Ren-Bin Liu: Funding acquisition (equal); methodology (equal); resources (equal); validation (equal); visualization (equal); writing – review and editing (equal). Chen-meng Long: Data curation (equal); formal analysis (equal); investigation (equal); project administration (equal). Yuan Teng: Data curation (equal); formal analysis (equal); investigation (equal). Yu Liu: Conceptualization (equal); formal analysis (equal); funding acquisition (equal); methodology (equal); supervision (equal); validation (equal).
ACKNOWLEDGMENTS
None.
FUNDING INFORMATION
This research was supported by grants from the National Natural Science Foundation (81372815), Guangdong Basic and Applied Basic Research Foundation (2021A1515110818), the Youth Education Grand of Sun Yat-sen University (N2019Y08), and the Guangdong Nature Science Foundation (2014A030313193).
CONFLICT OF INTEREST STATEMENT
The authors have stated explicitly that there are no conflicts of interest in connection with this article.
DATA AVAILABILITY STATEMENT
The datasets used and/or analyzed in this study are available from the corresponding author upon reasonable request.
ETHICS STATEMENT
This study was performed according to the Declaration of Helsinki. The study was approved by the ethics committee of the Third Affiliated Hospital of Sun Yat-sen University [2023-289-01]. All enrolled patients provided informed consent.
Sung H, Ferlay J, Siegel RL, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71:209‐249.
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68:394‐424.
Li JY, Jing R, Wei H, et al. Germline mutations in 40 cancer susceptibility genes among Chinese patients with high hereditary risk breast cancer. Int J Cancer. 2019;144(2):281‐289. doi:
Medeiros B, Allan AL. Molecular mechanisms of breast cancer metastasis to the lung: clinical and experimental perspectives. Int J Mol Sci. 2019;20(9): [eLocator: 2272]. doi:
Zheng YZ, Wang XM, Fan L, Shao ZM. Breast cancer‐specific mortality in small‐sized tumor with stage IV breast cancer: a population‐based study. Oncologist. 2021;26:e241‐e250. doi:
Voon W, Hum YC, Tee YK, et al. Evaluating the effectiveness of stain normalization techniques in automated grading of invasive ductal carcinoma histopathological images. Sci Rep. 2023;13(1): [eLocator: 20518]. doi:
Adam Maciejczyk A. New prognostic factors in breast cancer. Adv Clin Exp Med. 2013;22:5‐15.
Cardoso F, van't Veer LJ, Bogaerts J, et al. 70‐gene signature as an aid to treatment decisions in early‐stage breast cancer. N Engl J Med. 2016;375:717‐729. doi:
Sparano JA, Gray RJ, Makower DF, et al. Clinical outcomes in early breast cancer with a high 21‐gene recurrence score of 26 to 100 assigned to adjuvant chemotherapy plus endocrine therapy: a secondary analysis of the TAILORx randomized clinical trial. JAMA Oncol. 2020;6:367‐374. doi:
Sparano JA, Gray RJ, Ravdin PM, et al. Clinical and genomic risk to guide the use of adjuvant therapy for breast cancer. N Engl J Med. 2019;380:2395‐2405. doi:
Mamounas EP, Russell CA, Lau A, Turner MP, Albain KS. Clinical relevance of the 21‐gene recurrence score® assay in treatment decisions for patients with node‐positive breast cancer in the genomic era. npj Breast Cancer. 2018;4:27. doi:
Tiberi D, Masucci L, Shedid D, et al. Limitations of personalized medicine and gene assays for breast cancer. Cureus. 2017;9: [eLocator: e1100]. doi:
Blok EJ, Bastiaannet E, van den Hout WB, et al. Systematic review of the clinical and economic value of gene expression profiles for invasive early breast cancer available in Europe. Cancer Treat Rev. 2018;62:74‐90. doi:
Shachar SS, Muss HB. Internet tools to enhance breast cancer care. npj Breast Cancer. 2016;2: [eLocator: 16011]. doi:
Mook S, Schmidt MK, Rutgers EJ, et al. Calibration and discriminatory accuracy of prognosis calculation for breast cancer with the online adjuvant! Program: a hospital‐based retrospective cohort study. Lancet Oncol. 2009;10:1070‐1076. doi:
Wishart GC, Azzato EM, Greenberg DC, et al. PREDICT: a new UK prognostic model that predicts survival following surgery for invasive breast cancer. Breast Cancer Res. 2010;12: [eLocator: R1]. doi:
Wu X, Ye Y, Barcenas CH, et al. Personalized prognostic prediction models for breast cancer recurrence and survival incorporating multidimensional data. J Natl Cancer Inst. 2017;109: [eLocator: djw314]. doi:
Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Ann Appl Stat. 2008;2:841‐860. doi:
Savignoni A, Hajage D, Tubert‐Bitter P, De Rycke Y. Effect of an event occurring over time and confounded by health status: estimation and interpretation. A study based on survival data simulations with application on breast cancer. Stat Med. 2012;31:4444‐4455. doi:
Bellera CA, Macgrogan G, Debled M, de Lara CT, Brouste V, Mathoulin‐Pélissier S. Variables with time‐varying effects and the cox model: some statistical concepts illustrated with a prognostic factor study in breast cancer. BMC Med Res Methodol. 2010;10: [eLocator: 20]. doi:
Ishwaran H, Kogalur UB, Chen X, Minn AJ. Random survival forests for high‐dimensional data. Stat Anal Data Min ASA Data Sci J. 2011;4(1):115‐132. doi:
Breiman L. Random forests. Mach Learn. 2001;45:5‐32. doi:
Moorthy K, Mohamad MS. Random Forest for gene selection and microarray data classification. Bioinformation. 2011;7:142‐146. doi:
Ram M, Najafi A, Shakeri MT. Classification and biomarker genes selection for cancer gene expression data using random forest. Iran J Pathol. 2017;12:339‐347. doi:
Ishwaran H, Kogalur UB. Consistency of random survival forests. Stat Probab Lett. 2010;80:1056‐1064. doi:
Ehrlinger J. ggRandomForests: Exploring Random Forest Survival; 2016. doi:
Zollanvari A, Dougherty ER. Moments and root‐mean‐square error of the bayesian MMSE estimator of classification error in the Gaussian model. Pattern Recogn. 2014;47:2178‐2192. doi:
Su W, He B, Zhang YD, Yin G. C‐index regression for recurrent event data. Contemp Clin Trials. 2022;118: [eLocator: 106787]. doi:
Yang W, Jiang J, Schnellinger EM, Kimmel SE, Guo W. Modified brier score for evaluating prediction accuracy for binary outcomes. Stat Methods Med Res. 2022;31(12):2287‐2296. doi:
Heller G. The added value of new covariates to the brier score in cox survival models. Lifetime Data Anal. 2021;27(1):1‐14. doi:
Kuhn M, Johnson K. Feature Engineering and Selection.
Stuart‐Harris R, Caldas C, Pinder SE, Pharoah P. Proliferation markers and survival in early breast cancer: a systematic review and meta‐analysis of 85 studies in 32,825 patients. Breast. 2008;17:323‐334. doi:
Cotogno PM, Ranasinghe LK, Ledet EM, Lewis BE, Sartor O. Laboratory‐based biomarkers and liver metastases in metastatic castration‐resistant prostate cancer. Oncologist. 2018;23:791‐797. doi:
Zhang LX, Lv Y, Xu AM, Wang HZ. The prognostic significance of serum gamma‐glutamyltransferase levels and AST/ALT in primary hepatic carcinoma. BMC Cancer. 2019;19(1):841. doi:
Casadei‐Gardini A, Rimini M, Kudo M, et al. Real life study of lenvatinib therapy for hepatocellular carcinoma: RELEVANT study. Liver Cancer. 2022;11(6):527‐539. doi:
Kaewdech A, Sripongpun P, Assawasuwannakit S, et al. FAIL‐T (AFP, AST, tumor sIze, ALT, and tumor number): a model to predict intermediate‐stage HCC patients who are not good candidates for TACE. Front Med. 2023;10: [eLocator: 1077842]. doi:
Mosca M, Nigro MC, Pagani R, De Giglio A, Di Federico A. Neutrophil‐to‐lymphocyte ratio (NLR) in NSCLC, gastrointestinal, and other solid tumors: immunotherapy and beyond. Biomolecules. 2023;13(12): [eLocator: 1803]. doi:
Kiely M, Lord B, Ambs S. Immune response and inflammation in cancer health disparities. Trends Cancer. 2022;8:316‐327. doi:
Dolan RD, McSorley ST, Horgan PG, Laird B, McMillan DC. The role of the systemic inflammatory response in predicting outcomes in patients with advanced inoperable cancer: systematic review and meta‐analysis. Crit Rev Oncol Hematol. 2017;116:134‐146. doi:
Chen L, Kong X, Yan C, Fang Y, Wang J. The research progress on the prognostic value of the common hematological parameters in peripheral venous blood in breast cancer. Onco Targets Ther. 2020;13:1397‐1412. doi:
Yamanouchi K, Maeda S, Takei D, et al. Pretreatment absolute lymphocyte count and neutrophil‐to‐lymphocyte ratio are prognostic factors for stage III breast cancer. Anticancer Res. 2021;41(7):3625‐3634. doi:
Hua X, Long ZQ, Zhang YL, et al. Prognostic value of preoperative systemic immune‐inflammation index in breast cancer: a propensity scorematching study. Front Oncol. 2020;10: [eLocator: 580]. doi:
Nicolò C, Périer C, Prague M, et al. Machine learning and mechanistic modeling for prediction of metastatic relapse in early‐stage breast cancer. JCO Clin Cancer Inform. 2020;4:259‐274. doi:
Yu Y, Tan Y, Xie C, et al. Development and validation of a preoperative magnetic resonance imaging radiomics‐based signature to predict axillary lymph node metastasis and disease‐free survival in patients with early‐stage breast cancer. JAMA Netw Open. 2020;3: [eLocator: e2028086]. doi:
Liu C, Zhao Z, Gu X, et al. Establishment and verification of a bagged‐trees‐based model for prediction of sentinel lymph node metastasis for early breast cancer patients. Front Oncol. 2019;9: [eLocator: 282]. doi:
Holsbø E, Perduca V, Bongo LA, Lund E, Birmelé E. Predicting breast cancer metastasis from whole‐blood transcriptomic measurements. BMC Res Notes. 2020;13:248. doi:
Biganzoli L, Battisti NML, Wildiers H, et al. Updated recommendations regarding the management of older patients with breast cancer: a joint paper from the European Society of Breast Cancer Specialists (EUSOMA) and the International Society of Geriatric Oncology (SIOG). Lancet Oncol. 2021;22(7):e327‐e340. doi:
de Glas N, Bastiaannet E, de Boer A, Siesling S, Liefers GJ, Portielje J. Improved survival of older patients with advanced breast cancer due to an increase in systemic treatments: a population‐based study. Breast Cancer Res Treat. 2019;178(1):141‐149. doi:
Dillon J, Thomas SM, Rosenberger LH, et al. Mortality in older patients with breast cancer undergoing breast surgery: how low is “low risk”? Ann Surg Oncol. 2021;28(10):5758‐5767. doi:
Rossi L, McCartney A, De Santo I, et al. The optimal duration of adjuvant endocrine therapy in early luminal breast cancer: a concise review. Cancer Treat Rev. 2019;74:29‐34. doi:
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Background
Breast cancer (BC) metastasis is the common cause of high mortality. Conventional prognostic criteria cannot accurately predict the BC metastasis risk. The machine learning technologies can overcome the disadvantage of conventional models.
Aim
We developed a model to predict BC metastasis using the random survival forest (RSF) method.
Methods
Based on demographic data and routine clinical data, we used RSF‐recursive feature elimination to identify the predictive variables and developed a model to predict metastasis using RSF method. The area under the receiver operating characteristic curve (AUROC) and Kaplan–Meier survival (KM) analyses were plotted to validate the predictive effect when C‐index was plotted to assess the discrimination and Brier scores was plotted to assess the calibration of the predictive model.
Results
We developed a metastasis prediction model comprising three variables (pathological stage, aspartate aminotransferase, and neutrophil count) selected by RSF‐recursive feature elimination. The model was reliable and stable when assessed by the AUROC (0.932 in training set and 0.905 in validation set) and KM survival analyses (
Conclusions
This model relies on routine data and examination indicators in real‐time clinical practice and exhibits an accurate prediction performance without increasing the cost for patients. Using this model, clinicians can facilitate risk communication and provide precise and efficient individualized therapy to patients with breast cancer.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details

1 Department of Thyroid and Breast Surgery, Third Affiliated Hospital of Sun Yat‐sen University, Guangzhou, China
2 Department of Breast Surgery, Liuzhou Women and Children's Medical Center, Liuzhou, China
3 Department of Breast Surgery, Guangzhou Women and Children's Medical Center, Guangzhou, China