Content area
Purpose
Objectives were to develop a machine learning (ML) model based on electronic health record (EHR) data to predict the risk of vomiting within a 96-hour window after admission to the pediatric oncology and hematopoietic cell transplant (HCT) services using retrospective data and to evaluate the model prospectively in a silent trial.
Patients and methods
Admissions between 2018-06-02 to 2024-02-17 (retrospective) and 2024-05-09 to 2024-08-05 (prospective) to the oncology or HCT services were included. Data source was SEDAR, a curated and validated approach to deliver EHR data for ML. Prediction time was 08:30 the morning following admission. The outcome was any vomiting within 96 h following prediction time. We trained models using L2-regularized logistic regression, LightGBM and XGBoost. Training cohorts include the target cohort and all inpatient admissions.
Results
There were 7,408 admissions in the retrospective phase and 340 admissions in the prospective silent trial phase. The best-performing model in the retrospective phase was the LightGBM model trained on all inpatients. The number of features in the final model was 2,859. The area-under-the-receiver-operating-characteristic curve (AUROC) was 0.730 (95% confidence interval (CI) 0.694–0.765) for the retrospective phase and 0.716 (95% CI 0.649–0.784) for the prospective silent trial phase.
Conclusions
We found that data in the EHR could be used to develop a retrospective ML model to predict vomiting among pediatric oncology and HCT inpatients. This model retained satisfactory performance in a prospective silent trial. Future plans will include deployment into clinical workflows and determining if the model improves vomiting control.
Background
Vomiting is one of the most common symptoms experienced by pediatric cancer and hematopoietic cell transplant (HCT) patients, with substantial negative impacts on quality of life [1, 2]. Vomiting can reduce oral intake, worsen nutritional status, lead to hospitalization and increase healthcare costs [3]. Thus, efforts to minimize vomiting are crucial to improve the health of pediatric patients with cancer. Yet, vomiting control in pediatric patients receiving cancer treatments remains poor [4, 5].
Our ability to predict which patients are most likely to vomit is limited. Previously identified risk factors for vomiting in pediatric cancer patients have included older age, non-white race, cancer type, motion sickness, previous uncontrolled vomiting, emetogenicity of chemotherapy and specific anti-emetic medication administrations [6,7,8]. These efforts have focused on evaluation of a small number of potential predictors and traditional statistical approaches such as linear or logistic regression. Recently, the ability to create predictive models in healthcare has grown dramatically based on the increasing capacity to store and compute large amounts of data, and the availability of large repositories of clinical data such as those in electronic health records (EHR).
Machine learning (ML) is a promising approach to predict vomiting. ML models among adult patients can predict post-operative nausea and vomiting [9,10,11,12,13]. A small number of studies have described the ability of ML to predict chemotherapy-induced nausea and vomiting. For example, a study focused on adults receiving highly emetogenic chemotherapy developed a model with area-under-the-receiver-operating characteristic curve (AUROC) of 0.85 (95% confidence interval (CI) 0.78–0.92) and specificity of 0.86 [14]. Another study using a naïve Bayes classifier achieved AUROC of 0.72 (95% CI 0.69–0.75) [15]. However, almost no efforts have focused on predicting vomiting among pediatric oncology or HCT patients admitted to the hospital, a group with a high risk of vomiting, but with little previous ML-directed research. Additionally, to our knowledge, no studies have included a prospective evaluation (silent trial) [16] of such a model in pediatric patients. A silent trial is crucial for integrating a model into clinical workflows [17], as it provides a safe assessment of the model in the intended deployment setting without impacting patient care, and identifies issues such as temporal dataset shift [18, 19], bias and system readiness.
Consequently, our objectives were to develop a ML model based on EHR data to predict the risk of vomiting within a 96-hour window after admission to the pediatric oncology and HCT services using retrospective data and to evaluate the model prospectively in a silent trial.
Methods
This work falls under the Pediatric Real-world Evaluative Data sciences for Clinical Transformation (PREDICT) program that launched at The Hospital for Sick Children (SickKids) in September 2023. PREDICT’s goal is to develop, evaluate, deploy and maintain clinical ML models to improve pediatric patient outcomes using EHR data. The study was conducted at SickKids in Toronto, a large tertiary care pediatric center that sees approximately 300 new cancer diagnosis and performs approximately 100 HCT procedures per year.
This study included a retrospective and a prospective phase. Both phases were approved by the Research Ethics Board at SickKids. The requirement for informed consent and assent was waived given the nature of the studies.
Subjects
All patients admitted to the oncology or HCT services at SickKids were included, with each admission serving as the unit of analysis. All admissions were eligible, irrespective of reason for admission, including those for chemotherapy, HCT procedures, or supportive care such as fever management. We excluded admissions where patients were discharged prior to midnight on the day of admission.
For retrospective model development and evaluation, admissions between June 2, 2018 (date that SickKids adopted Epic as its EHR) and February 17, 2024 were included. For the prospective silent trial, admissions between May 9, 2024 and August 5, 2024 were included.
Data source
The data source was the SickKids Enterprise-wide Data in Azure Repository (SEDAR) [20]. SEDAR is a modular and robust approach to deliver foundational EHR data that is re-usable across multiple ML projects. It offers EHR data in a standardized “Curated Schema” derived from SickKids’ Epic Clarity database (over 19,000 tables). This schema consolidates the EHR data into a unified structure of 20 tables organized by clinically-relevant domains such as patients, encounters and laboratory results, making it easier to query and reuse data for needs across the institution, including ML.
Label, prediction time and prediction window
The outcome (label) was a binary variable indicating any vomiting in the 96-hour window post prediction time (yes/no). Prediction time was at 08:30 h the morning following admission as clinical decisions are typically made during morning rounds on weekdays, weekends and holidays. Predictions were made once per admission. The prediction window (period in which vomiting was observed) spanned 0–96 h post prediction time. This 96-hour window was chosen because many patients are expected to be admitted for chemotherapy administration and it encompasses the acute chemotherapy-induced vomiting (CIV) phase for patients receiving chemotherapy on days 1 to 4, captures a portion of the delayed CIV phase for patients receiving chemotherapy on days 1 to 3, and aligns with the typical 3- to 5-day admission duration [21]. The acute CIV phase begins with the administration of the first chemotherapy dose of the block and ends 24 h after administration of the last chemotherapy dose of the chemotherapy block. The delayed CIV phase begins immediately after the acute phase and continues for 96 h. A chemotherapy block is a period of consecutive days when chemotherapy is administered daily [22].
In the SickKids’ EHR, vomiting is described in the flowsheets using descriptors such as emesis volume, count, amount, color/appearance, and any vomiting/retching/gagging. Multiple descriptors can be used at a specific time stamp but no one descriptor is used consistently. Thus, a composite binary variable (yes/no) was computed where yes represents any vomiting entry to a vomiting-related flowsheet row within the 96-hour window. Vomiting determination using this approach was previously validated retrospectively by identifying patients who received etoposide, ifosfamide or treosulfan, and randomly selecting 60 patients stratified by age and HCT status (personal communication, Dr. Lee Dupuis, January 2023). There was complete agreement in vomiting occurrence between this approach and manual chart review. We chose to use any vomiting as the primary outcome as complete control of vomiting is considered the gold standard objective in antiemetic clinical trials [23, 24].
ML model development
Model development followed a standardized process used across the PREDICT program as previously described [25]. The following summarizes the details for this specific project.
Retrospective phase
We implemented a pipeline on a static set of identified cohort, label and SEDAR data to enable reproducibility.
Featurization
We used a uniform approach to extract features from SEDAR Curated Schema tables. Since Clarity is updated once daily, our pipeline included features as of midnight on the day of admission (index time) to match the retrospective modeling setting with the intended deployment setting. Extracted features included demographic information (sex and age) and clinical observations over predefined time intervals prior to prediction time (0–1 days, 1–7 days and prior to 7 days). Clinical observations included categorical data (such as diagnosis codes) and continuous data (such as blood glucose levels). For categorical elements, we counted occurrences within each interval. For continuous elements, we computed the mean, and for frequently observed measurements (those recorded more than twice on average for all patients), we also calculated the minimum and maximum values within each interval. We excluded elements with fewer than 25 observations or those not observed in the last 90 days of the dataset.
Model training and selection
The dataset was temporally split into training, validation and test sets in a 70:15:15 ratio. Preprocessing steps included standardizing age (and count values for logistic regression models) and encoding measurement features into quintiles before one-hot encoding. This process resulted in a large, sparse feature matrix for each feature set. All preprocessing steps were fit using the training set and applied to the validation and test sets.
We trained models using L2-regularized logistic regression (via Sci-kit Learn [26]) and gradient boosting machines (GBM) using LightGBM [27] and XGBoost [28]. Training cohorts included both the target cohort (oncology and HCT services) and the broader cohort of all inpatient admissions. Hyperparameter selection was performed using 5-fold cross-validation, optimizing for AUROC. We used recursive feature elimination for feature reduction as previously described [25]. In brief, we removed feature groups by ranking them from lowest to highest by the maximum absolute value of coefficients for logistic regression, absolute gain for XGBoost and split count for LightGBM. We then removed them in steps ranging from 50% to 95%. At each step, we retrained and scored the model five times, averaging the cross-validation AUROC. We continued this process until the final step or until the cross-validation score decreased by more than 2% compared to the base model trained on all features. The validation set AUROC was used to select the best model across all training experiments.
Model evaluation
We expressed aggregate and subgroup-specific threshold-independent and threshold-dependent metrics. Threshold-independent metrics were AUROC, area-under-the-precision-recall curve and expected calibration error. Threshold dependent metrics were sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). We reported 95% confidence interval (CI) for each metric using the percentile bootstrap with 2000 resamples.
The prediction threshold is the cut point that defines a positive prediction. To identify thresholds, the intervention must be well-delineated since it will determine the implications of false positives. For this scenario, the action plan for patients identified to be at high risk of vomiting by the model will be to charge pharmacists with bringing information about patients’ vomiting risk to the attention of the medical team and optimizing care pathway-consistent care. These care pathways were adapted from clinical practice guideline recommendations in consideration of SickKids’ resources and values. This project was led by two clinical champions; they are clinical pharmacists who are also clinician scientists (PP and LLD). To select thresholds, the champions were presented with multiple thresholds and asked to choose a threshold value based on predictive performance and operational considerations as previously described [25]. In short, they were shown number of alerts and threshold-dependent metrics for the following: maximizing true predictions (Youden’s index), and number needed to alert (NNA) values of 2 and 3. The NNA reflects the number of alerts the champions were willing to tolerate for one true prediction. The champions could also request presentation of additional NNA values such as 4 and 5. The thresholds were based upon data in the retrospective validation set.
To enable fairness evaluations, metrics were further stratified across several demographic and socioeconomic factors, including sex, age group [29], non-English language flag, neighborhood income and the four dimensions of the Canadian Index of Multiple Deprivation based on the 2021 census [30]: residential instability, economic dependency, situational vulnerability, and ethno-cultural composition.
Based on the retrospective test set evaluations, the champions were responsible for assessing model performance and evaluating the fairness metrics to determine whether the model was adequate to proceed to the prospective silent trial. For this assessment of the retrospective model, an in-person meeting was held where a ML specialist (LLG) presented the results to the champions and facilitated interpretation.
Patients could be included multiple times if they had multiple admissions, reflecting the intended deployment setting. More specifically, this approach reflects real-world practive where patients may experience multiple hospitalizations. During model deployment, we would not exclude patients from predictions if they had previously been admitted, a common occurrence in pediatric oncology patients. To address the potential for biased performance related to repeated observations from the same patient, we conducted a sensitivity analysis including each patient only once such that patients in the evaluation set did not overlap with those in the training set. We did not limit patient age as these services can admit patients over 18 years.
Prospective silent trial phase
The prospective silent trial phase involved integrating the model selected in the retrospective phase into an environment with data flow and infrastructure mirroring those planned for deployment. To best mirror deployment, patients included in the retrospective phase could be included in the silent trial phase. Predictions were not provided to clinical teams during this phase. The silent trial phase was planned for three months, at the end of which model performance, fairness, and adequacy in the intended deployment setting were evaluated using the same metrics and process as in the retrospective phase.
Results
Figure 1 shows the flow diagram of admission identification and selection, stratified by retrospective and prospective phases. Table 1 presents the demographic characteristics of the cohorts. There were 7,408 admissions included in the retrospective phase and 340 admissions included in the prospective silent trial phase.
[IMAGE OMITTED: SEE PDF]
[IMAGE OMITTED: SEE PDF]
The best-performing model in the retrospective phase was the LightGBM model trained on all inpatients. Appendix 1 lists the model hyperparameters. The number of features before and after recursive feature selection was 27,595 and 2,859, respectively. Table 2 shows the threshold-independent performance metrics in the retrospective test set. The prevalence of vomiting overall was 25.2%. When stratified by service, the prevalence of vomiting was 22.2% in the oncology cohort and 39.3% in the HCT cohort. The AUROC was 0.730 (95% CI 0.694–0.765). The clinical champions chose an NNA threshold of 3 in the oncology cohort and an NNA of 2 in the HCT cohort. Table 3 shows the threshold-dependent metrics in both cohorts . The PPV and NPV were 0.306 and 0.893 in the oncology cohort and were 0.479 and 0.817 in the HCT cohort. Sensitivity analysis including patients only once showed AUROC of 0.700 (95% CI 0.656–0.740), PPV of 0.331 and NPV of 0.866 using the same thresholds among all patients.
[IMAGE OMITTED: SEE PDF]
[IMAGE OMITTED: SEE PDF]
The clinical champions decided these metrics were satisfactory to proceed to the silent trial phase. The prevalence of vomiting was similar to the retrospective test set and the AUROC was 0.716 (95% CI 0.649–0.784) (Table 2). Table 3 shows the threshold-dependent metrics in the prospective silent trial. The number of alerts anticipated in the oncology and HCT cohorts were 12–13 and 3–4 per week, respectively. The PPV and NPV were 0.259 and 0.900 in the oncology cohort and were 0.512 and 0.842 in the HCT cohort. Appendices 2 and 3 provide results of fairness evaluations stratified by sociodemographic characteristics. No concerns were identified. Appendix 4 shows feature importance of the selected model for the top 20 features.
Discussion
Using data in the EHR, we found that the prevalence of vomiting within a 96-hour window proximal to admission was about 20% in the oncology cohort and 40% in the HCT cohort. We were successful in creating a retrospective model predicting vomiting among inpatients to the oncology and HCT services and confirming maintained performance in a prospective silent trial.
Publications of ML silent trials are rare in pediatric patients. We found that model performance remained robust during the 3-month prospective evaluation period. Continued monitoring and evaluation are important to ensure model reliability [31]. However, monitoring model performance alone following clinical integration may not be sufficient, as a correct positive prediction might appear as a false positive due to successful intervention (e.g., clinical practice guideline-consistent anti-emetic administration) informed by the model’s prediction. Therefore, it is also important to measure patient outcomes post-integration and evaluate them against pre-integration baseline data to assess the model’s actual clinical impact.
An important step in the process was to elicit champion choice of thresholds, which reflects their trade-off between sensitivity and specificity. This threshold must be paired with the intervention since this choice determines the implications of a false positive. In this example, the “costs” of a false positive is additional, potentially unnecessary pharmacist review in evaluating patients for care pathway-consistent care. Thus, their perspectives were key in threshold choice and approval of metrics using this threshold. The champions must also evaluate the consequences of a false negative. In this example, a false negative does not have additional consequences as these patients will receive usual care. We used different thresholds for the oncology and HCT populations. The champions prioritized maintaining the sensitivity at approximately 80%, explaining the rationale behind different thresholds. This provides an example of the nuanced decision making required by clinical champions for ML implementation.
During the ML development process, there is a need for early and deep engagement with clinical operational leaders. Due to their direct accountability for budget, staffing, quality and workload, they must be involved in decision making early in the process. They need to understand the model’s impact on clinical operations and must have the opportunity to ask questions as the model is being developed. This need aligns with recommended change management strategies [32]. Early, consistent, and in-depth discussions with clinical leaders are crucial for successful ML implementation. Involvement of clinical informatics early in the process is also important to define ideal workflows to allow successful implementation for the clinical team.
Across the PREDICT program, we do not provide model interpretability metrics to target users such as clinical pharmacists in this case. The rationale for this decision includes importance instability due to high dimensional data. Further, others have discouraged their dissemination as they tend to be unreliable and may be misused [33].
A strength of this study was the inclusion of prospective model results, in contrast to most ML publications that only evaluate retrospective data. Another strength is the involvement of two clinical champions throughout this program, an important consideration as the champions will be key to its successful deployment. However, our results should be considered in light of its limitations. The model does not incorporate data collected between midnight on the day of admission and prediction time (08:30 h the following morning), which could potentially improve model performance. In addition, as we included multiple admissions per patient, this could favorably bias results. Another limitation is that the bootstrapped 95% CI can provide misleading results when the sample size is small. Thus, these CIs should be viewed cautiously. Finally, we did not collect detailed cancer or treatment characteristics such as type of cancer or administered chemotherapy. These variables will be important to collect and report on for the evaluation of the deployed model.
Our future plans include integrating real-time data and exploring opportunities for implementing a foundation model at our own institution and others. These models are initially trained on large, broad datasets using a self-supervised learning objective and subsequently adapted for downstream tasks such as vomiting prediction. We have previously shown that such models are portable across hospitals and improve predictive performance at lower costs [34].
Conclusions
In conclusion, we found that data in the EHR could be used to develop a retrospective ML model to predict vomiting among pediatric oncology and HCT inpatients. This model retained satisfactory performance in a prospective silent trial. Future plans will include deployment into clinical workflows and determining if the model improves vomiting control.
Data availability
The data used in this study cannot be made publicly available because of the potential risk to patient privacy. However, relevant data is available from the corresponding author upon reasonable request.
Sommariva S, Pongiglione B, Tarricone R. Impact of chemotherapy-induced nausea and vomiting on health-related quality of life and resource utilization: a systematic review. Crit Rev Oncol Hematol. 2016;99:13–36.
Dupuis LL, Johnston DL, Baggott C, Hyslop S, Tomlinson D, Gibson P, et al. Validation of the symptom screening in pediatrics tool in children receiving cancer treatments. J Natl Cancer Inst. 2018;110(6):661–8.
Craver C, Gayle J, Balu S, Buchner D. Clinical and economic burden of chemotherapy-induced nausea and vomiting among patients with cancer in a hospital outpatient setting in the United States. J Med Econ. 2011;14(1):87–98.
Flank J, Sparavalo J, Vol H, Hagen L, Stuhler R, Chong D, et al. The burden of chemotherapy-induced nausea and vomiting in children receiving hematopoietic stem cell transplantation conditioning: a prospective study. Bone Marrow Transplant. 2017;52(9):1294–9.
Vol H, Flank J, Lavoratore SR, Nathan PC, Taylor T, Zelunka E, et al. Poor chemotherapy-induced nausea and vomiting control in children receiving intermediate or high dose methotrexate. Support Care Cancer. 2016;24(3):1365–71.
Eliasen A, Kornholt J, Mathiasen R, Brok J, Rechnitzer C, Schmiegelow K, et al. Risk factors associated with nausea and vomiting in children with cancer receiving chemotherapy. J Oncol Pharm Pract. 2023;29(6):1361–8.
Dupuis LL, Tamura RN, Kelly KM, Krischer JP, Langevin AM, Chen L, et al. Risk factors for chemotherapy-induced nausea in pediatric patients receiving highly emetogenic chemotherapy. Pediatr Blood Cancer. 2019;66(4):e27584.
Dupuis LL, Tomlinson GA, Pong A, Sung L, Bickham K. Factors associated with chemotherapy-induced vomiting control in pediatric patients receiving moderately or highly emetogenic chemotherapy: a pooled analysis. J Clin Oncol. 2020;38(22):2499–509.
Kim JH, Cheon BR, Kim MG, Hwang SM, Lim SY, Lee JJ, et al. Postoperative nausea and vomiting prediction: machine learning insights from a comprehensive analysis of perioperative data. Bioeng (Basel). 2023;10(10):1152.
Zhou CM, Wang Y, Xue Q, Yang JJ, Zhu Y. Predicting early postoperative PONV using multiple machine-learning- and deep-learning-algorithms. BMC Med Res Methodol. 2023;23(1):133.
Xie M, Deng Y, Wang Z, He Y, Wu X, Zhang M, et al. Development and assessment of novel machine learning models to predict the probability of postoperative nausea and vomiting for patient-controlled analgesia. Sci Rep. 2023;13(1):6439.
Shim JG, Ryu KH, Cho EA, Ahn JH, Cha YB, Lim G, et al. Machine learning for prediction of postoperative nausea and vomiting in patients with intravenous patient-controlled analgesia. PLoS ONE. 2022;17(12):e0277957.
Wu HY, Gong CA, Lin SP, Chang KY, Tsou MY, Ting CK. Predicting postoperative vomiting among orthopedic patients receiving patient-controlled epidural analgesia using SVM and LR. Sci Rep. 2016;6:27041.
Zhang J, Cui X, Yang C, Zhong D, Sun Y, Yue X, et al. A deep learning-based interpretable decision tool for predicting high risk of chemotherapy-induced nausea and vomiting in cancer patients prescribed highly emetogenic chemotherapy. Cancer Med. 2023;12(17):18306–16.
Cao Z, Xiong X, Yang Q. [Establishment of Naive Bayes classifier-based risk prediction model for chemotherapyinduced nausea and vomiting]. Nan Fang Yi Ke Da Xue Xue Bao. 2021;41(4):607–12.
Kwong JCC, Erdman L, Khondker A, Skreta M, Goldenberg A, McCradden MD, et al. The silent trial - the bridge between bench-to-bedside clinical AI applications. Front Digit Health. 2022;4:929508.
Bedoya AD, Economou-Zavlanos NJ, Goldstein BA, Young A, Jelovsek JE, O’Brien C, et al. A framework for the oversight and local deployment of safe and high-quality prediction models. J Am Med Inform Assoc. 2022;29(9):1631–6.
Guo LL, Pfohl SR, Fries J, Posada J, Fleming SL, Aftandilian C, et al. Systematic review of approaches to preserve machine learning performance in the presence of Temporal dataset shift in clinical medicine. Appl Clin Inf. 2021;12(4):808–15.
Finlayson SG, Subbaswamy A, Singh K, Bowers J, Kupke A, Zittrain J, et al. The clinician and dataset shift in artificial intelligence. N Engl J Med. 2021;385(3):283–6.
Guo LL, Calligan M, Vettese E, Cook S, Gagnidze G, Han O, et al. Development and validation of the sickkids enterprise-wide data in Azure repository (SEDAR). Heliyon. 2023;9(11):e21586.
Guo LL, Morse KE, Aftandilian C, Steinberg E, Fries J, Posada J, et al. Characterizing the limitations of using diagnosis codes in the context of machine learning for healthcare. BMC Med Inform Decis Mak. 2024;24(1):51.
Patel P, Robinson PD, Cohen M, Devine K, Gibson P, Holdsworth MT, et al. Prevention of acute and delayed chemotherapy-induced nausea and vomiting in pediatric cancer patients: a clinical practice guideline. Pediatr Blood Cancer. 2022;69(12):e30001.
Gabby ME, Bugin K, Lyons E. Review article: the evolution of endpoint assessments for chemotherapy-induced nausea and vomiting and post-operative nausea and vomiting-a perspective from the US food and drug administration. Aliment Pharmacol Ther. 2021;54(1):7–13.
Hesketh PJ, Gralla RJ, du Bois A, Tonato M. Methodology of antiemetic trials: response assessment, evaluation of new agents and definition of chemotherapy emetogenicity. Support Care Cancer. 1998;6(3):221–7.
Yan AP, Guo LL, Inoue J, Arciniegas SE, Vettese E, Wolochacz A, et al. A roadmap to implementing machine learning in healthcare: from concept to practice. Front Digit Health. 2025;20(7):1462751.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in python. JMLR. 2011;12(85):2825–30.
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W et al. LightGBM: a highly efficient gradient boosting decision tree. Proceedings of the 31st International Conference on Neural Information Processing Systems; Long Beach, California, USA: Curran Associates Inc.; 2017. pp. 3149–57.
Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System2016.
Williams K, Thomson D, Seto I, Contopoulos-Ioannidis DG, Ioannidis JP, Curtis S, et al. Standard 6: age groups for pediatric trials. Pediatrics. 2012;129(Suppl 3):S153–60.
Statistics Canada. The Canadian Index of Multiple Deprivation. Statistics Canada Catalogue no. 45-20-00012019.
Youssef A, Pencina M, Thakur A, Zhu T, Clifton D, Shah NH. External validation of AI models in health should be replaced with recurring local validation. Nat Med. 2023;29(11):2686–7.
Smith TG, Norasi H, Herbst KM, Kendrick ML, Curry TB, Grantcharov TP, et al. Creating a practical transformational change management model for novel artificial intelligence-enabled technology implementation in the operating room. Mayo Clinic Proceedings: Innovations, Quality & Outcomes. 2022;6(6):584–96.
Ghassemi M, Oakden-Rayner L, Beam AL. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit Health. 2021;3(11):e745-50.
Guo LL, Fries J, Steinberg E, Fleming SL, Morse K, Aftandilian C, et al. A multi-center study on the adaptability of a shared foundation model for electronic health records. NPJ Digit Med. 2024;7(1):171.
© 2025. This work is licensed under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.