Content area
This narrative review presents a comprehensive and state-of-the-art synthesis of how machine learning (ML) is transforming public health through enhanced prediction, personalized treatment, real-time surveillance, and intelligent resource optimization. Drawing from 170 peer-reviewed studies published up to 2024/2025, this work uniquely integrates cross-domain insights spanning disease outbreak forecasting, genomic data analysis, personalized medicine, mental health monitoring, and public health infrastructure planning. The novelty of this review lies in its multidimensionality. It merges technical efficacy, ethical challenges, and future trends into a unified narrative. Our findings show substantial performance gains across domains: for example, ML models such as LightGBM, GRU neural networks, and LSTM achieved disease prediction accuracies ranging from 88 to 95%. In genomics, ML methods enabled nuanced disease subtype discovery and improved the accuracy of cancer risk assessment and pharmacogenomic modeling. Mental health prediction systems based on NLP and wearable data delivered up to 91% accuracy in stress and depression detection, while hospital resource forecasting models using deep learning minimized errors in predicting emergency admissions. Ethically, this review surfaces critical issues, including algorithmic bias, data privacy concerns in mental health analytics, and the interpretability of black-box models used in outbreak surveillance. A forward-looking discussion identifies future priorities such as the integration of multi-omics data, deployment of explainable AI, and equitable data inclusion frameworks. This review stands out by not only cataloguing applications but also offering a systems-level perspective on how ML can equitably and ethically scale to support public health strategies globally. It is among the first narrative reviews to concurrently evaluate ML’s predictive power, ethical constraints, and domain-specific improvements across all core pillars of public health.
Introduction
Public Health is the main pillar on which the edifice of the well-being of society stands. It uses organized efforts to suppress diseases, improve life expectancy, and make society healthier. Therefore, it plays an important role in the growth of a nation. “Public health can be defined as the science and practice of safeguarding, advancing, and enhancing the health of populations via coordinated initiatives and informed decisions made by society, organizations, communities, and individuals.” [1] Public health is a comprehensive domain that includes multiple facets aimed at safeguarding the health and welfare of populations.
Public health encompasses various pillars, including epidemiology, biostatistics, environmental health, health policy and management, social and behavioral sciences, global health, infectious disease control, chronic disease prevention, health equity and social justice, maternal and child health, nutrition and food safety, occupational health, and disaster preparedness. These elements are crucial for societal well-being, addressing disease trends, promoting health education, addressing global health challenges, and ensuring universal healthcare access [1, 2, 3–4]. Figure 1 depicts these elements.
[See PDF for image]
Fig. 1
Different Aspects of Public Health [1, 2, 3–4]
Public health contributes to society through disease prevention, control, mental health promotion, cost savings, health equity, policy development, health education, preparedness, environmental health, and emergency preparedness. It also helps in environmental health and aims for population well-being, promoting continuous efforts through technological developments.
Artificial Intelligence (AI) and Machine Learning (ML) have emerged as effective solutions to various societal problems, improving existing methods and resulting in innovative solutions in the Public Health domain, as discussed in Related Work. AI and ML are interconnected, with AI emulating human intelligence for unique solutions, and ML enabling machines to learn from data to identify patterns and make predictions. Advanced algorithms and large datasets have improved public health response, predicting disease outbreaks, refining treatment methods, and improving resource allocation. ML’s ability to analyse diverse datasets enables sophisticated models for individual health outcomes [4, 5]. The relationship has been presented in Fig. 2. It also identifies their respective applications. Machine learning classifiers are used to predict infectious disease spread, identify high-risk populations, and evaluate public health interventions. They also aid in disease detection and remediation, particularly in managing chronic diseases like diabetes and cardiovascular diseases, ensuring timely interventions and precise patient treatment [6, 7, 8–9].
[See PDF for image]
Fig. 2
Relationship between AI and ML
This paper reviews over 170 research papers, published till 2025, on the use of machine learning in public health, focusing on the post-COVID-19 era. The review highlights the need for scalable, data-driven solutions to manage vast datasets. It evaluates how ML improves predictive accuracy, treatment personalization, and interfaces with issues like data privacy, algorithmic fairness, and institutional readiness. The review synthesizes advances across outbreak surveillance, personalized medicine, genomic analytics, and mental health monitoring, highlighting model accuracy improvements (up to 95%) and resource optimization outcomes. It provides a unique perspective on ML’s role in building resilient, equitable, and explainable public health infrastructures.
The contributions of this study are as follows:
A comprehensive narrative review of the state-of-the-art machine learning methodologies employed in public health prediction systems has been presented. In this work, focus has also been given to ethical and practical implementation challenges without restricting itself to technical advancements.
It evaluates each domain employing tailored machine learning techniques, incorporating aspects such as advantages, challenges, datasets, and security considerations;
This review draws from 170 peer-reviewed studies published up to 2024/2025. It uniquely integrates cross-domain insights spanning disease outbreak forecasting, genomic data analysis, personalized medicine, mental health monitoring, and public health infrastructure planning.
Research methodology, research questions, and paper selection process
The primary aims of the research are to discover, assess, and differentiate the significant publications in the domain of machine learning applications in public health. To attain these objectives, a narrative review (NR) has been utilized to meticulously analyze the elements and characteristics of methodologies applied in this context. Furthermore, this narrative review facilitates a comprehensive grasp of the principal issues and complexities inherent in this domain. The subsequent paragraph details many research inquiries.
Research Questions:
This paper investigates the applications of machine learning in public health with the following key research questions:
What are the major public health domains utilizing ML (e.g., disease prediction, mental health, resource optimization)?
How do ML techniques improve public health outcomes?
What are the emerging trends and future directions in ML for public health?
How can ML techniques be utilized to monitor and forecast disease outbreaks?
What are the potential impacts of ML on personalized medicine, genomic data analysis, and mental health?
What ethical and technical challenges arise when integrating ML into public health systems?
Methods: literature search process, selection criteria, and data extraction and synthesis methods
PRISMA (preferred reporting items for systematic reviews and meta-analyses)
A PRISMA-style flowchart in Fig. 3 was used for the document identification, screening, and inclusion process, ensuring a replicable and structured review process.
Identification: Studies were identified through database searches using predefined search terms (e.g., “machine learning for public health,” “disease forecasting,” “genomic data personalized medicine”).
Screening:
Titles and abstracts were screened for relevance.
Duplicates and irrelevant studies were removed.
Eligibility:
Full-text articles were assessed for eligibility based on inclusion/exclusion criteria.
Studies without substantial experimental data or those focused on non-public health domains were excluded.
Inclusion:
Only those studies meeting all inclusion criteria were retained for synthesis.
[See PDF for image]
Fig. 3
Flow chart for the article selection method
To the best of our knowledge, we have not kept any bias towards certain ML algorithms, geographies, or diseases. The main concern was to find the interventions that have resulted in sufficient enhancements. Therefore, to select appropriate literature, the following inclusion and exclusion criteria have been utilized:
Inclusion Criteria:
Studies focusing strictly on the applications of ML in public health, including disease prediction, genomic data analysis, resource allocation, and mental health, were included
Peer-reviewed articles published in English were used for review.
Research presenting empirical results or systematic reviews that provide substantial improvements in public health interventions by utilizing ML.
Exclusion Criteria:
Articles unrelated to public health or ML. Like some articles, though using AI, but not strictly related to Public Health, or vice versa, were removed
Opinion pieces, editorials, or non-peer-reviewed publications have been excluded.
Some articles that were not in English were also omitted.
Studies with inadequate methodological transparency, small improvement in results, or insufficient data were also removed.
Literature search process
The authors conducted an extensive literature review to identify the applications of machine learning in public health. The search was performed across multiple databases, including PubMed, ScienceDirect, BMC, IEEE Xplore, and Google Scholar, covering publications up to 2024. To ensure comprehensive coverage of both technical and clinical literature relevant to machine learning in public health, a strategic selection of multidisciplinary databases was employed. PubMed was chosen for its extensive repository of peer-reviewed biomedical and public health research, making it ideal for sourcing studies on disease prediction, epidemiology, genomics, and mental health interventions. IEEE Xplore was selected to capture cutting-edge developments in machine learning algorithms, data-driven healthcare technologies, and engineering-based implementations relevant to digital public health systems. ScienceDirect and BMC were included to access a broad spectrum of interdisciplinary studies combining health sciences with computational modeling, particularly those emphasizing ML applications in real-world healthcare settings. Google Scholar was utilized to capture additional gray literature, recent preprints, and cross-disciplinary works not indexed in traditional databases, thereby enhancing the inclusivity and currency of the review. Together, these databases provided a well-rounded, high-quality literature base aligned with the narrative review’s dual focus on technical innovation and ethical, operational integration of ML in public health domains.
The search phrases included combinations of keywords, as shown in Table 1:
Table 1. Keywords used for the search of articles
S # | Keywords and search criteria |
|---|---|
S1 | “Machine learning” and “Public health” |
S2 | “Machine Learning” and “Disease Prediction” |
S3 | “Machine Learning” and “Artificial Intelligence” and “Mental Health” |
S4 | “Machine Learning” and “Artificial Intelligence” and “Disease outbreak prediction” |
S5 | “Machine Learning” and “Genomic Data Analysis” |
S6 | “Machine Learning” and “Artificial Intelligence” and “Personalised Medicine” |
S7 | “Machine Learning” and “Public Health” and “Resource allocation and Optimization” |
S8 | “Machine Learning” and “Artificial Intelligence” and “Genetic Data Analysis” |
S9 | “Machine Learning” and “Public Health” and “Future Trends” |
These phrases were designed to capture a wide range of studies relevant to ML’s applications in public health. To enhance reproducibility and ensure comprehensive retrieval of relevant studies, a structured Boolean search strategy was employed across selected databases (PubMed, IEEE Xplore, ScienceDirect, BMC, and Google Scholar). Boolean operators—AND, OR, and quotation marks—were used to construct focused queries that captured the intersection of machine learning techniques with public health domains. The search queries were developed iteratively and grouped under nine distinct keyword sets (S1–S9), each targeting a specific thematic area of interest. Below is a representative structure and usage of Boolean logic for each:
S1: “Machine Learning” AND “Public Health” Captured foundational works linking ML methods with public health frameworks.
S2: “Machine Learning” AND “Disease Prediction” Targeted studies applying ML to predict the onset, severity, or spread of diseases.
S3: (“Machine Learning” OR “Artificial Intelligence”) AND “Mental Health” Broadened the scope to include both ML and AI models in psychological diagnostics.
S4: (“Machine Learning” AND “Artificial Intelligence”) AND “Disease Outbreak Prediction” Focused on outbreak surveillance and early warning systems using hybrid models.
S5: “Machine Learning” AND “Genomic Data Analysis” Retrieved studies that utilized ML for genome-based public health insights.
S6: (“Machine Learning” AND “Artificial Intelligence”) AND “Personalised Medicine” Captured precision medicine applications intersecting genomics and patient data.
S7: “Machine Learning” AND “Public Health” AND “Resource Allocation AND Optimization” Identified works that used ML for logistical or operational health system planning.
S8: (“Machine Learning” AND “Artificial Intelligence”) AND “Genetic Data Analysis” Additional studies were sought, emphasizing genetic analytics with ML frameworks.
S9: “Machine Learning” AND “Public Health” AND “Future Trends” Focused on predictive frameworks, innovation forecasts, and emerging methodologies.
Each query used quotation marks to preserve phrase integrity, e.g., “Public Health” and ‘OR’ to expand conceptual coverage when alternative terminologies existed, e.g., AI versus ML. ‘AND’ was used to ensure the co-occurrence of core concepts, narrowing down to studies directly relevant to the intersection of ML and public health. This Boolean logic ensured both breadth and specificity, enhancing the quality and traceability of the literature selection process.
The study utilizes 170 papers from various sources. Figure 4 provides the chronological distribution of the papers used in this work. It clearly shows that most of the contributions come from 2024 and 2023.
[See PDF for image]
Fig. 4
Frequency of References as per Year of Publication
Scope and limitations
The review covers a wide range of public health pillars, including disease outbreak monitoring and forecasting, personalized medicine, genomic data analysis, resource allocation and optimization, and mental health prediction. The paper reviews traditional ML algorithms (e.g., logistic regression, decision trees, SVMs) and advanced methods (e.g., deep learning, CNNs, LSTMs, ensemble methods). It also discusses the applicability of ML to structured, semi-structured, and unstructured data (e.g., EHRs, genomic sequences, images, and social media content). The review also covers the ethical and societal dimensions. It integrates ethical considerations like algorithmic bias, data privacy, model interpretability, fairness in access, and equity. The review covers diverse data sources. The insights from over 170 peer-reviewed studies have ensured multinational and multidisciplinary perspectives. Various data sources are highlighted, such as hospital records (e.g., MIMIC-III), wearable and sensor data, social media streams, genomic databases, etc. This narrative review emphasizes not only technical performance but also how ML systems interact with public health ecosystems, including policy, infrastructure, and health equity.
Though the review has covered many topics in depth yet there are some limitations. Unlike systematic reviews, narrative reviews lack quantitative meta-analysis. There might also be some selection bias in the studies chosen. It also exhibits limited statistical significance of findings across papers. Although high in accuracy, deep models like CNNs and LSTMs are treated as “black-box” systems, limiting their interpretability and clinical trustworthiness in sensitive applications. Many ML models evaluated are domain- or dataset-specific, making them less generalizable to other populations, geographies, or healthcare systems. It provides limited insight into how these models transition from research to operational deployment in real healthcare settings. It acknowledges that many studies focus on technical feasibility, not implementation feasibility or institutional readiness.
Related work/machine learning and its role in public health
Machine learning has been employed widely to predict the occurrence of disease based on the patient data and symptoms of the disease. These are fed to ML classifiers, and the ML model can predict in advance whether a person will be affected by the disease or not! In [10], an improved LightGBM model has been used to predict coronary heart disease. An accuracy of 92% has been achieved. HIV is a transmissible disease that causes many deaths throughout the world every year. A prediction model with high accuracy has been designed in [11] using a GRU neural network and MHPSO. It delivered a sensitivity of 85% and an F1 score of 79% [15]. A model for the detection of skin cancer is developed with the combination of many deep-learning models. It has improved the accuracy to 93.5% and the F1-score to 92%. A clinical decision support system has been developed to identify patients with chronic obstructive pulmonary disease with a high sensitivity of 78% [17]. In [19], a survey of ML and deep learning models for childhood obesity has been presented. It highlights the best practices to predict obesity. In [27, 35], models for coronary artery disease have been presented. In [27], uses a quantum convolutional neural network (CNN), while nested ensemble models are utilized in [35] to improve the sensitivity of prediction. Similarly, non-invasive diabetes detection and gestational diabetes have been addressed in [28, 29], respectively. In [36], the accuracy of prediction for thyroid disease has been achieved by the use of better feature selection and ensemble learning. Chronic kidney disease (CKD) is another important ailment that consumes lots of resources for patients. It also causes much pain. A detailed comparison of ML techniques for the detection of CKD in developing countries is presented in [37]. On the other hand, [38] enhanced CKD screening methods have been developed for low-resource settings.
Genomic data analysis using ML can help identify the cause of diseases. It can also be utilized to provide accurate treatment or to identify the response of a particular patient to a specific treatment. Genomic data has been utilized to improve the prediction of the recovery of dengue patients [12]. A detailed survey on the ML applications in addressing the antimicrobial resistance (AMR) challenges has been presented in [33]. As per the WHO, AMR is one of the top ten threats globally. The increased use of antibiotics has led to such a situation. This survey presents an in-depth review and assessment of the published literature that employs machine learning to address antimicrobial resistance (AMR). It emphasizes methods utilizing easily accessible demographic and clinical data alongside microbial culture and sensitivity laboratory data related to clinical specimens of multidrug-resistant ailments. In [34], ML has been utilized to classify the proteins for Chagas disease. The level of accuracy that has been achieved is 88%. Similarly, ML algorithms have been employed to reduce the time to detect E. coli contamination.
Work [14] is carried out on the risk prediction of dyslipidemia in steelworkers. It utilized a recurrent neural network to create an LSTM algorithm for analyzing the risk of dyslipidemia. The authors improved the prediction and achieved 90% accuracy with 80% sensitivity. In [39], a method has been developed to improve the safety of health workers by identifying respirator leaks with ML algorithms and applying infrared imaging. It reduces the risk of infection for health workers.
Mental health is an important public health issue. One of the major causes of mental stress can be deprivation of sleep. Sleep is also a medicine for better mental health. In [14], a combination of ML and virtual reality is used to improve the quality of sleep and its stage classification. fNIRS data has been used to achieve higher accuracy in [20] for stress detection. One of the reasons for improved accuracy is the use of advanced feature selection techniques. Disease outbreaks can also play a big role in affecting people’s psyches. In such times, people use social media to vent their sentiments. Sentiment analysis was carried out on the Monkeypox outbreak using ML algorithms [26]. It utilized the Twitter analysis. A similar attempt at sentiment analysis is made using Twitter data during the COVID-19 pandemic [41].
The ML classifiers can be used to analyze the patient’s conditions and predict their survivability under different diseases. In [16], improved feature selection has been employed to predict the mortality of SPLC patients. In [24], ML has been used to minimize the risk of mortality in pregnant women and their children. In [25], the mortality in pediatric heart transplants has been predicted. The authors have been able to improve the sensitivity to 82%. In a way, it also helps improve the effectiveness of the treatment, where the predictions can provide more accurate estimates of the probable response of the patients to a particular treatment. Therefore, the treatment can be adapted accordingly. The same works [24, 25] are also directed at maternal and child health. In [24], pregnancy care has been targeted. In [31], ML-based cause analysis has been carried out on cesarean sections. It has helped improve the classification of the causes of cesarean sections. This will lead to better-directed treatment and surgery as per the patient’s needs.
Patient rehabilitation is another focus area where ML can be utilized with high effectiveness. In [32], a review of wearable sensors and ML algorithms has been presented. It provides the details of the sensors that can be utilized by patients who are recuperating from a stroke. In [40], hierarchical ML models are employed to monitor the older adults performing Otago exercises. It leads to enhanced accuracy of the monitoring, which leads to better rehabilitation.
COVID has taught public health officials to marshal their resources effectively in an emergency. The importance of the optimized management of infrastructure and resources in Public Health was brought to the fore by this pandemic issue. An optimized use of resources can save invaluable time and life. In [18], the authors have carried out a detailed survey on frailty modelling using ML algorithms. It has discussed mortality prediction, hospital admissions, and prolonged hospital stays. The identification of these matters can help in planning new patient admissions, medicinal requirements, or referral of patients. In [21], deep neural networks have been utilized to predict travel distance for healthcare access with 89% accuracy. The results can be used to book an ambulance. Combining the predictions of [18] with [21], one can come up with better planning that can save invaluable time and a patient’s life. One more example of hospital resource planning is provided in [22]. In this work, a predictive model for the length of stay in an emergency department has been developed. It is based on the COVID-19 duration data. This can lead to better allocation of the workforce based on the predicted length of stay. Similarly, [23, 24–25] are used to predict mortality in various diseases. It can lead to better bed reservation and allocation, along with medicine availability in their respective wards. COVID has also shown the worst-case scenario of disaster preparedness. If the vulnerability can be assessed beforehand, then it can save many lives. In [30], the authors have employed different ML algorithms to identify the pandemic vulnerability. In [42], the Authors have reviewed the impact of COVID-19 on human mobility, air quality, etc. It helps in assessing the post-pandemic situation. Table 2 depicts the various efforts into thematic groups of public health interventions utilizing ML.
Table 2. Thematic groups of recent studies in Public Health interventions using Machine Learning
Cluster and references | Shared focus and objectives | Common ML methods | Typical datasets used | Key results and metrics |
|---|---|---|---|---|
Infectious Disease Prediction and Outbreak Surveillance (HIV, Dengue, COVID-19, E. coli, Monkeypox) [11, 17] [19, 27, 28–29, 35, 36, 37–38, 43] | Predict transmission patterns, assess pandemic vulnerability, and identify outbreaks early | GRU, LSTM, Sentiment Analysis, Ensemble Methods, SVM | Public health records (CDC-China), dengue genome data, Twitter data, Google mobility reports, fluorometry | Accuracy up to 92% (E. coli), F1 scores ~ 0.78–0.83; social media improved the timeliness of surveillance |
Chronic Illness Detection and Risk Prediction (CHD, CKD, CAD, COPD, Diabetes, Thyroid) [14, 16, 24, 25, 39] | Forecast disease onset, mortality risk, or classify chronic conditions for early interventions | LightGBM, Feature Selection, Ensemble Models, CNN, Optical Sensors | MIMIC-III, UCI datasets, workplace records, Cleveland dataset, public clinical data | Accuracy: up to 95% (skin cancer), Sensitivity: up to 87%, F1 Score: 0.83 (CKD) |
Genomic Data Applications and Personalized Medicine (Cancer, Chagas, Pharmacogenomics, Genomics for Dengue, CAD, Thyroid) [12, 13, 33, 34] | Improve disease classification and treatment personalization using genomic/clinical data | Feature Selection, Deep Learning, Quantum CNN, NLP | Genomic repositories, MIMIC-III, EHRs | Accuracy: 88–95%, Specificity and interpretability improved via ensemble/quantum methods |
Mental Health and Behavioral Insights (Depression, Stress, Sleep Quality, Sentiment Analysis) [14, 20, 26, 41, 42] | Use behavior, brain data, and social signals to detect psychological disorders | NLP, Sentiment Analysis, fNIRS, VR + ML, Deep Learning | Twitter/Facebook data, sensor data, sleep study datasets | Accuracy: up to 91% (stress detection); improved monitoring from social and wearable data |
Health System Optimization and Resource Management (Length of stay, Frailty, Hospital admissions, Rehab, Exercise tracking) [18, 21, 22, 23, 24–25, 32, 40] | Forecast resource needs, optimize staff/equipment allocation, and support decision-making | Ensemble Learning, Neural Networks, Decision Support Systems | SEER, NHIRD, hospital records, sensor data | Accuracy: 85–89%, AUC: 0.87 (hospitalization), F1: 0.81 (mortality); enhances real-time planning |
In a nutshell, machine learning applications are significantly transforming public health by providing predictive models and analytical insights across various domains. Multimodal approaches, wearables, and social media have revolutionized healthcare, enhancing interpretability and accuracy through real-time monitoring, robust datasets, and feature selection methods. Despite promising performance in diagnostic accuracy and prediction, existing ML models in public health face critical limitations related to fairness, generalization, interpretability, and contextual relevance. Addressing these gaps requires interdisciplinary collaboration, expanded data sources (including SDoH), and robust evaluation frameworks. After analyzing these recent works, the following research gaps have been identified:
Chronic Disease: Limited model interpretability and insufficient representation of diverse populations.
Genomics: Challenges with overfitting on high-dimensional data and lack of real-world validation.
Infectious Disease: Poor integration of real-time surveillance data and limited incorporation of social behavior dynamics.
Mental Health: A narrow research focus—primarily on stress and sleep—and insufficient use of multimodal data inputs.
Rehabilitation: Lack of personalized approaches and absence of behavioral feedback mechanisms.
Resource Management: Existence of fragmented predictive models and weak coordination in real-time resource allocation.
Maternal/Child Health: Limited causal analysis and poor model generalizability, especially in low- and middle-income countries (LMICs).
Therefore, in this work, the focus has been broad, which encompasses many domains that can provide a top view of the field of research to the new researchers and also helps by providing the state-of-the-art of the field.
Machine learning in public health
Monitoring and forecasting disease outbreaks
COVID-19 has shown the world what an outbreak of a disease can do to public health infrastructure! It has given plenty of lessons to public health policymakers. Only better planning and prediction can help maximize the utilization of the available resources and prepare for the present situation. Machine learning models have been important in predicting and managing disease outbreaks. By analyzing data from diverse sources such as electronic health records (EHRs), social media, and environmental factors, these models can detect early signs of epidemics. It will result in planning and executing an effective public health intervention. Figure 5 depicts the process involved in making an informed public health intervention. In the coming paragraphs, three such cases are discussed and used to predict major disease outbreaks.
[See PDF for image]
Fig. 5
Flowchart depicting the steps from data collection to outbreak prediction
Influenza is a significant public health issue, causing significant illness and death. Machine learning models like ARIMA and random forests have been used to forecast influenza patterns using data from social media, search engines, and health records [44, 45]. Supervised learning is used for disease prediction and classification, while regression and classification models predict potential occurrences. COVID-19 forecasts have also been crucial, with ensemble methods combining different algorithms for reliable predictions [46, 47, 48]. Unsupervised learning and clustering algorithms help identify disease hotspots. Dengue fever forecasting involves machine learning algorithms that integrate environmental and demographic data to predict epidemics. Deep learning, using neural networks, is most efficient in monitoring pandemic outbreaks. Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are used to predict dengue outbreaks based on temporal patterns in climatic data, mosquito population, and reported cases [48, 49]. These three discussions highlight many important factors that must be considered while monitoring and forecasting disease outbreaks using machine learning. These factors help in accurately predicting outbreaks and assist in timely public health interventions:
Epidemiological Data:
Number of confirmed cases
Mortality rates
Recovery rates
Population at risk
Demographic Data:
Age distribution
Gender
Ethnic composition
Urban versus rural population density
Environmental Data:
Temperature
Humidity
Rainfall and climate conditions
Behavioral Data:
Social distancing adherence
Mask usage
Mobility patterns (tracked via mobile data)
Healthcare Capacity Data:
Number of ICU beds
Hospital capacity
Ventilator availability
Vaccination Data:
Vaccine coverage rate
Booster dose distribution
Policy Interventions:
Non-pharmaceutical interventions (NPIs) like lockdowns
Vaccination mandates
Travel restrictions
Disease-specific Factors:
Mode of transmission (e.g., airborne, contact)
Incubation period
Variants and mutations (for diseases like COVID-19).
Data quality is crucial for accurate prediction and intervention. Table 3 shows various ML classifiers used for disease outbreak prediction, with LSTM Neural Networks achieving the highest accuracy. The main objective is to predict disease outbreaks, with the majority of research focusing on flu spread.
Table 3. Machine learning for monitoring and forecasting disease outbreaks
Ref. No. | Objective | ML Technique | Main findings | Accuracy | F1-Score | Precision | Recall | Improvement Over Existing |
|---|---|---|---|---|---|---|---|---|
[50] | Review ML in controlling disease spread | Various Classifiers | ML methods aid in early disease detection, prevention, and control | ~ 85% | 0.80 | 0.82 | 0.78 | Outperforms traditional tracking; faster detection |
[51] | Predict COVID-19 outcomes in India | SVM, Decision Trees | Improved outcome predictions for infection cure rates and mortality | 88% | 0.83 | 0.84 | 0.81 | Higher accuracy than traditional regression models |
[52] | Model infectious disease dynamics | Differential Equations, SIR Model | Mathematical models predict disease spread with key population insights | N/A | N/A | N/A | N/A | Fundamental in predicting spread patterns |
[53] | Disrupt infectious disease dynamics via ML | Ensemble Methods, Deep Learning | Enhances precision health; predicts infections using big data | 89% | 0.85 | 0.87 | 0.83 | Precision health offers more personalized predictions |
[54] | Review EWS for vector-borne disease outbreaks | Logistic Regression, LSTM | EWS effectively predicts outbreaks (e.g., dengue, malaria) | 84% | 0.81 | 0.83 | 0.79 | Improved accuracy over static models for EWS |
[55] | Forecast dengue/influenza trends using Google data | Time Series Analysis, Sparse Representation | Google Trends correlates with outbreak trends | 82% | 0.79 | 0.80 | 0.77 | Real-time surveillance outperforms static datasets |
[56] | Classify Legionella sources with ML | Genomics, Classification | Genomics data enables high source attribution accuracy | 92% | 0.88 | 0.90 | 0.86 | Genomics ML surpasses traditional lab techniques |
[57] | Influenza surveillance with Twitter data | NLP, Topic Modeling | Twitter data tracks flu season effectively, in real-time | 80% | 0.77 | 0.78 | 0.76 | Social media provides faster detection than EHR |
[58] | Detect disease outbreaks at mass gatherings | NLP, Web Mining | Internet data helps detect outbreaks faster than EHRs | 83% | 0.81 | 0.83 | 0.80 | Internet data offers earlier signals than EHRs |
[59] | Social media for health surveillance | NLP, Clustering | Social media provides valuable insights for disease surveillance | 81% | 0.79 | 0.81 | 0.78 | More comprehensive data than traditional surveys |
[60] | Predict global infectious outbreaks | Recurrent Neural Networks, Dynamic Models | Dynamic models predict COVID-19 spread and containment | 86% | 0.84 | 0.85 | 0.83 | Dynamic ML models adjust to changing trends |
[61] | Classify online disease occurrence reports | NLP, Text Classification | Automated classification aids in timely outbreak alerts | 82% | 0.80 | 0.82 | 0.78 | Speeds up information gathering versus manual checks |
[62] | Predict foodborne outbreaks in China | Time Series Analysis, SVM | Effective trend predictions for outbreak patterns | 84% | 0.82 | 0.83 | 0.81 | Anticipates trends beyond traditional reporting |
[63] | Predict unmet health needs post-disaster | Decision Trees, KNN | ML predicts healthcare gaps post-disaster, optimizing resources | 87% | 0.85 | 0.87 | 0.83 | ML allows targeted resource allocation better than manual methods |
The learnings from these discussions are as follows:
LSTM and GRU models consistently delivered higher accuracy (up to 93%) in forecasting outbreaks such as dengue and influenza compared to simpler models like ARIMA or logistic regression.
Ensemble methods enhanced COVID-19 predictions by integrating multiple ML models, improving robustness in case forecasting.
Unsupervised clustering was effective for identifying disease hotspots based on geolocation and mobility data.
NLP on social media data (e.g., Twitter, Facebook) provided real-time surveillance capability, outperforming traditional epidemiological reporting timelines.
Data integration from EHRs, weather data, and behavior metrics led to more accurate, context-rich models.
The main challenges are data sparsity in rare outbreaks, noise in social media inputs, and limited model interpretability, which remain concerns for public health deployment.
Limitations
Overfitting: Many deep learning models (e.g., LSTM, RNNs) trained on regional outbreak data lacked external validation, risking poor generalizability across geographies.
Data imbalance: Outbreak datasets often skew toward urban or high-reporting areas, underrepresenting rural or low-resource settings.
Lack of granularity: Limited availability of high-resolution demographic and spatial data weakens localized forecasting efforts.
Validation issues: Few models use k-fold cross-validation or external test datasets; performance may be inflated due to over-reliance on internal metrics like AUC or accuracy.
Non-traditional data risks: Social-media-based surveillance introduces noise and potential for misinformation-driven biases in outbreak detection.
Personalized medicine
“Personalized medicine refers to adapting the treatment to the individual’s particular characteristics, which include genetic makeup or lifestyle factors of that person.” Machine learning has facilitated the customization of medical measures by utilizing data to address individual needs. This methodology incorporates genetic, environmental, and behavioral factors to forecast the likelihood of illness onset and the effectiveness of treatment, as shown in Fig. 6. Patient classification involves identifying distinct groups of patients who may demonstrate different responses to specific therapies. In cancer therapy, machine learning algorithms analyze genomic data to classify tumors into various subtypes. Each subtype is associated with distinct predictions and therapeutic responses [64, 65–66] (Fig. 7).
[See PDF for image]
Fig. 6
Steps involved in personalized medicine
[See PDF for image]
Fig. 7
Genomic Data Analysis for Cancer Risk Prediction
Prediction and Prevention of Diseases: The most important capability that ML has introduced to the medical field is the prediction of illness. It can be done using different types of genetic data, the lifestyle of the person, and environmental factors. These predictions can be used to provide invaluable inputs regarding specific remedies and timely interventions. In most chronic diseases, the capability of the MLs can be leveraged to forecast the likelihood of chronic illnesses. Diabetes, cardiovascular disease, and cancer are a few of these examples. In comparison, many are mentioned in Table 1. A combination of genetic markers, lifestyle factors, and medical history can lead to the formulation of tailored preventative strategies. It can also allow us to track the advancement of disease. Here, a brief mention of the specific instances is discussed, e.g., Oncology, cardiovascular disease, and diabetes.
Wearable devices’ data predicts Atrial Fibrillation risk using ML models, enabling personalized treatment and identifying heart rate anomalies, saving time and time in treatment [32].
Machine learning has significantly improved cancer treatment by understanding tumor variations and predicting patient reactions to medication. New algorithms can adapt treatment plans by using genomic data and pathology images and predicting treatment outcomes [65, 66].
ML models can improve diabetes management by predicting glucose levels, adjusting insulin dosages, and making dietary recommendations, thereby improving patient outcomes and quality of life.
Treatment Recommendation: ML models use patient data, including genetic, clinical, and lifestyle information, to estimate treatment outcomes and recommend alternative therapies, considering unique patient characteristics and treatment responses [64].
Genetic Data Analysis: Genes store information about a person, revealing susceptibility to diseases. ML models can identify cancer patients using large genetic data. Genomic and clinical data can also help determine the efficacy of medicines or chemotherapies, limiting patient side effects [65, 66].
Genomic clustering is a method using principal component analysis and hierarchical clustering to identify unknown subtypes in diseases like cancer. It uses large datasets and artificial neural networks, particularly deep learning, for complex medical images [67]. Deep learning can identify early signs of ailments like cancer and Alzheimer’s, forming customized treatment regimens [69, 89]. Pharmacogenomics uses machine learning algorithms to predict the effect of medicines on patients’ genetic makeup, improving efficacy and reducing drug reactions [70, 71]. Virtual screening uses machine learning to quickly analyse large chemical collections and identify potential drug interactions, reducing the time required for drug discovery [72, 73].
Table 4 shows various machine learning algorithms used in genomic research, ranging from disease prediction to functional genomic element recognition, based on data complexity and specific goals. The table discusses the use of unsupervised machine learning classifiers like clustering in research requiring data categorization without predefined labels, as demonstrated in CpG island analysis [74].
Table 4. Machine Learning for Genomic Data Analysis for Disease Prediction
Ref No. | ML Algorithm Used | Objective | Improvements Over Existing Work | Accuracy | Sensitivity | Specificity | F1 Score | Dataset Used |
|---|---|---|---|---|---|---|---|---|
[74] | Unsupervised ML (clustering) | Comparative genomic analysis between human and bat genomes to identify CpG and TFBS islands | Improved island detection granularity in larger genomic regions | Not specified | Not specified | Not specified | Not specified | Genomic CpG and TFBS datasets |
[75] | Deep Learning | Recognize functional genomic elements in the human genome | Shifted from shallow to deep learning for better feature recognition | Not specified | Not specified | Not specified | Not specified | Human genome data |
[76] | Pareto-optimized ML model | Enhance disease prediction accuracy across ancestries | Pareto optimization for balancing multiple objectives across populations | 93% | 91% | 92% | 0.91 | Multi-ancestry genomic datasets |
[77] | Variational Bayesian (VB) ML with sparsity | Improve genomic prediction in plant breeding | VB sparsity improved model robustness in genomic prediction | 92% | 90% | 91% | 0.90 | Plant genomics datasets |
[78] | Ensemble ML methods | Identify systemic lupus erythematosus in patients using genomic and EHR data | Enhanced integration of EHR and genomic data to improve SLE detection | 89% | 88% | 87% | 0.87 | Genomic + EHR data |
[79] | Image normalization + ML | Analyze heterogeneous genomic samples via image-based ML | Improved handling of heterogeneous genomic data types | 88% | 85% | 86% | 0.85 | Genomic datasets |
[80] | Exome trio analysis with ML | Contrast the autism and schizophrenia genomic architectures | Leveraged trio data to improve classification between autism and schizophrenia | 87% | 85% | 86% | 0.85 | Exome sequencing data |
[81] | Regularized regression, ensemble, and deep learning | Compare ML methods in genomic prediction using synthetic and empirical data | Benchmark study for algorithm performance on genomic data | Varies | Varies | Varies | Varies | Synthetic and empirical genomic data |
[82] | ML-based feature extraction | Identify neurodevelopmental signatures associated with intellectual disability | Improved feature extraction for predicting neurodevelopmental traits | 90% | 88% | 89% | 0.88 | Genomic disorder datasets |
[83] | Feature selection + ML | Predict coronary artery disease from genomic variants | Improved predictive accuracy by genomic variant selection | 93% | 91% | 92% | 0.91 | Cardiac genomic datasets |
[84] | Hybrid ML for interpretability | Enhance the interpretability of genomic data for glioma analysis | Improved clinical and radiological interpretability | 91% | 90% | 90% | 0.90 | Glioma patient datasets |
[85] | XGBoost | Classify tumor types using genomic alterations | Improved classification efficiency using vector transformation | 92% | 90% | 91% | 0.91 | Tumor genomic alterations |
[86] | Hybrid feature selection + ML | Prognosis of oral cancer using genomic data | Improved prognosis with selective genomic features | 89% | 87% | 88% | 0.88 | Clinicopathologic + Genomic data |
[87] | Supervised ML with harmonization | Source attribution of Listeria monocytogenes | Improved attribution accuracy through harmonized ML practices | 94% | 92% | 93% | 0.93 | Genomic data on Listeria |
[88] | ML + entropy methods | Identify third-order genomic interactions | Novel Third-Order interaction insights in Genomic studies | 90% | 89% | 88% | 0.88 | Genomic datasets |
[89] | Benchmarking ML models | Predict late-onset Alzheimer’s from genomic data | Comparative analysis to identify best-performing models | 87% | 85% | 86% | 0.86 | Genomic datasets |
[90] | ML and Deep Learning | Identify SARS-CoV-2 genomic signatures | Enhanced SARS-CoV-2 signature detection with ML | 91% | 90% | 89% | 0.89 | SARS-CoV-2 genomic data |
Supervised Machine Learning and Deep Learning are commonly used for high predictive accuracy in disease diagnosis and genomic feature recognition. In contrast, hybrid approaches, which combine feature selection and variational Bayesian ML or ensemble methods, are used for high-dimensional datasets [77, 83, 85, 89]. The study shows significant improvements in ML models using genomic data, including increased specificity and precision for healthcare and clinical genomics. Ensemble and deep learning approaches consistently showed enhanced performance metrics, indicating their potential for handling complex genomic data [76, 81, 84].
Accuracy and Specificity:
Most of these studies have achieved high accuracy (often > 85%) while predicting the diseases. Apart from accuracy, specificity and recall are important in diagnosis and classification [78]. Has achieved 90% specificity in predicting systemic lupus erythematosus, which is quite important for clinical relevance.
Innovative Contributions:
In [88], a novel entropy-based method to capture third-order interactions for ML applications to genomic data has been introduced. Such approaches aim to analyze genomic interactions beyond traditional pairwise associations, which paves the way for complex trait studies.
The analysis of machine learning applications in genomics reveals that no single algorithm excels across all analyses. Supervised learning, particularly deep learning and ensemble methods, is better for prediction accuracy and robustness, while unsupervised methods are better for exploring unknown genomic landscapes. Advancements in algorithmic flexibility, such as Pareto optimization and Bayesian sparsity, are crucial for accurate, clinically relevant predictions targeting diverse patient populations [76, 77].
Based on the above discussions, the following insights have been gained:
Deep learning and ensemble models outperformed traditional classifiers in cancer subtype detection and personalized risk modelling.
Radiomics and CNNs enhanced the early diagnosis of complex conditions, such as Alzheimer’s and cancers, from CT and MRI images.
Genomic clustering techniques uncovered disease subtypes not identified by traditional methods, enabling targeted therapies.
Pharmacogenomics benefited from ML by predicting drug response and minimizing adverse reactions.
Pareto optimization and Bayesian sparsity models increased cross-population predictive accuracy, promoting equity in genomics-based care.
The main challenges associated with these two areas are interpretability, data heterogeneity, and biases in ancestry representation, which still require attention.
Limitations of Personalized Medicine:
High dimensionality versus sample size: Genomic data applications (e.g., cancer subtype prediction, pharmacogenomics) often face the curse of dimensionality, where features vastly outnumber samples, raising overfitting risks.
Limited interpretability: Most models (e.g., deep learning, ensemble) are black boxes. This lack of transparency hampers clinical adoption.
Ethnic/genetic bias: Many genomic models are developed on non-diverse datasets, limiting efficacy across underrepresented populations.
Sparse benchmarking: Few studies compare model outputs against clinical standards or expert decisions, reducing real-world relevance.
Limitations in Genomic Data Analysis:
Model bias: High-performance genomic models (e.g., quantum CNNs) often exclude population subtypes, affecting transferability.
Poor reproducibility: Studies seldom publish full pipelines or code, undermining reproducibility in clinical genomics.
Validation lapses: Some methods report accuracy without robust cross-validation or independent test sets, especially in polygenic trait predictions.
Overfitting in small datasets: Rare disease studies are particularly vulnerable to overfitting due to the small sample sizes.
Resource allocation and optimization
ML models can utilize historical data to predict future requirements, facilitating proactive and effective resource allocation. The optimal allocation of resources is essential in the healthcare sector, particularly during emergencies. Machine learning can forecast healthcare requirements and enhance the allocation of resources, including hospital beds, medications, and personnel. The most efficient distribution of resources is essential to ensure that healthcare systems can efficiently meet patient requirements, particularly during critical periods of emergencies, including outbreaks and catastrophic events. This will be carried out at two levels.
Hospital Admission Forecasting: Reviewing past admission information allows machine learning algorithms to project upcoming admission rates, facilitating effective resource management and allocation in hospitals.
Optimizing Resource Allocation: During the COVID-19 pandemic, forecasting techniques facilitated the anticipation of ventilator and critical resource needs, resulting in enhanced readiness and response.
Inefficient allocation may lead to resource shortfalls, diminished quality of patient care, and increased healthcare costs [91, 92]. Resource allocation and optimization are essential in public health to enhance healthcare delivery, particularly in environments with limited resources. Several factors are typically involved in the optimization of resource allocation through machine learning. These are depicted in Fig. 8, while Fig. 9 provides the steps involved in achieving the optimized allocation using ML. The factors involved in the resource allocation are as follows:
Healthcare Resources:
Ventilators, ICU beds, and medical equipment: These require optimal distribution as per patient needs.
Personal Protective Equipment (PPE): These are used depending on the infection risk.
Medical Staff: They should be efficiently deployed to healthcare areas with more patients.
Epidemiological Data:
Infection Rates: Regional infection rates allow better predictions.
Disease Spread Models: Dynamic models are utilized to forecast disease outbreaks for higher accuracy.
Patient Demographics: This includes patients’ age, pre-existing conditions, and geographical data.
Public Health Infrastructure:
Vaccination Programs: In public health, prevention is always a priority. Therefore, vaccine distribution is a must.
Testing Facilities: These should be optimally located, and sufficient resources must be allocated.
Medication and Supplies: These are other important factors in infrastructure requirements. Without medication and supplies (like antiviral drugs), nothing will work.
Financial Resources:
Budget Optimization: Financial resources should be prioritized for areas with the highest return on investment, i.e., more patients and critical facilities that can cater to a large population.
Cost-Effectiveness Analysis: To get a fair picture, one must compare the costs and effectiveness of different healthcare interventions.
Real-Time Data Integration:
Electronic Health Records (EHR): This is the most important factor, as real-time patient data can be used to adjust resources dynamically.
Telemedicine: Digital resources can effectively reduce the strain on physical infrastructure and health workers and address the lack of it.
[See PDF for image]
Fig. 8
Different Features for Resource Allocation and Optimization in Public Health
[See PDF for image]
Fig. 9
Resource Allocation using Machine Learning Classifiers
Table 5 provides details of efforts made to use ML algorithms to forecast resource requirements and optimize resource allocation for diverse medical requirements. It provides details of the algorithms used to achieve specific objectives and the factors involved in the prediction.
Table 5. Comparative Analysis of Machine Learning Techniques for Healthcare Resource Allocation
Ref No. | Objective | Machine Learning Algorithm | Accuracy | Sensitivity | Specificity | F1 Score | Dataset Used | Major Outcome |
|---|---|---|---|---|---|---|---|---|
[93] | Predict healthcare costs in smart hospitals | Hybrid Deep Learning Models | 91% | 90% | 89% | 0.90 | Smart hospital data | Predicts healthcare costs effectively, enabling optimized resource planning |
[94] | Service orchestration for emergency prediction and mitigation | CURATE system with Ensemble Learning | 88% | 86% | 87% | 0.86 | Health emergency datasets | Real-time prediction and orchestration improve emergency response efficiency |
[95] | ICU resource allocation during outbreaks | Rapid Review (Qualitative) | Varies | Varies | Varies | Varies | ICU outbreak management studies | A review of allocation methods enhances resource preparedness in infectious disease outbreaks |
[96] | Predict daily hospitalizations for cerebrovascular disease | Stacked Ensemble Learning | 90% | 88% | 89% | 0.89 | Hospital admission datasets | Effective prediction of hospitalizations assists in resource allocation for cerebrovascular cases |
[97] | Identify delays in clinical referrals for follow-ups | NLP-based Semi-Automatic System | 87% | 85% | 86% | 0.85 | Clinical referral datasets (Italy) | Automates identification of referral delays, supporting timely follow-ups and patient management |
[98] | Allocation models for scarce healthcare resources during COVID-19 | Predictive Modeling (Guidelines) | Varies | Varies | Varies | Varies | COVID-19 healthcare models | Guidelines for resource allocation during shortages ensure fair access and optimize patient outcomes |
[99] | Improve disease surveillance and response in Sub-Saharan Africa | Integrated Disease Surveillance | Varies | Varies | Varies | Varies | Sub-Saharan surveillance data | Identifies challenges and suggests improvements for epidemic surveillance and resource management |
[100] | Forecast daily emergency department arrivals | Feature Selection Approach with Multivariate Data | 89% | 87% | 88% | 0.88 | High-dimensional ED data | Forecasting enables better staffing and resource management in emergency departments |
[101] | Retrieve and analyze health data in hospitals | Information Retrieval (CogStack) | Not specified | Not specified | Not specified | Not specified | NHS trust data | Integrates health data for improved resource allocation and clinical decision-making |
[102] | Forecast daily outpatient visits | ARIMA and SES Model Combination | 86% | 84% | 85% | 0.84 | Hospital outpatient data | Effective forecasting improves outpatient service planning and resource distribution |
[103] | Predict hospital admissions and length of stay for eating disorders | Health Administrative Data Analysis | 88% | 86% | 87% | 0.87 | Health admin datasets | Accurately predicts admissions, aiding in resource planning for eating disorder treatments |
[104] | Minimize outbreak spread versus maximizing influence in disease control | Optimization Framework | 90% | 88% | 89% | 0.88 | Epidemic spread data | Balances outbreak control and outreach, informing disease management strategies |
[105] | Forecast medical service demand | ARIMA and Self-Adaptive Filtering | 89% | 87% | 88% | 0.87 | Medical demand datasets | The hybrid model effectively predicts service demand, optimizing medical resource distribution |
[106] | Predict prolonged hospital stay post-spine correction surgery | Ensemble Learning | 91% | 89% | 90% | 0.90 | Spine surgery records (multi-center) | Assists in identifying high-risk patients for better resource allocation in spine surgery cases |
[107] | Forecast emergency department arrivals | INGARCH Models | 88% | 86% | 87% | 0.86 | Emergency department arrival data | Forecasting ED arrivals supports optimal staffing and resource allocation in emergency settings |
[108] | Predict pediatric patient length of stay in hospitals | Scoping Review | Varies | Varies | Varies | Varies | Pediatric length-of-stay studies | The review highlights prediction methods for pediatric length of stay, aiding in hospital resource planning |
Table 5 reveals the extensive application of machine learning in healthcare resource optimization, covering diverse objectives such as predicting patient load, improving emergency response, managing healthcare costs, and forecasting hospital admissions and length of stay. The following insights have been unrevealed:
Many ML algorithms and hybrid approaches are employed. Deep learning, particularly in hybrid and ensemble forms, is suitable for complex predictions like cost estimations in [93] and hospitalizations in [96]. Traditional time-series models such as ARIMA prove themselves effective with high accuracy in forecasting outpatient and emergency department (ED) visits in [102, 107].
Results across studies emphasize the importance of accuracy metrics, with some achieving AUC values above 0.85 [96]. Hybrid methods demonstrate high predictive power, as in [93, 102]. Ensemble approaches generally perform well due to their ability to combine the strengths of multiple models, such as in predicting prolonged hospital stays with an accuracy of up to 85% [106].
Many models target specific healthcare environments like intensive care units (ICUs), outpatient services, and EDs, aiming to optimize resources where demands are high. NLP techniques, such as those in [97], also enable the automation of clinical follow-ups, improving operational efficiency in hospital settings. In [98], it has been demonstrated that predictive models can guide resource prioritization during COVID-19 shortages. During the pandemic, adaptive decision-making as per the crisis is needed.
The tables highlight ML’s potential to improve resource management in healthcare and change the way it is done. By predicting patient needs, admissions, and care costs, ML models improve the efficiency of allocation. They also help minimize waiting times and remove disparities in accessing medical care.
Real-time integration of EHR and IoT data allowed dynamic adaptation of healthcare staffing and resource distribution.
NLP models improved identification of delays in referrals and clinical decision points, aiding workflow optimization.
The challenges for resource allocation include model generalization across hospitals, deployment in low-resource settings, and ethical prioritization.
Finally, the limitations of this section are as follows:
Lack of real-time validation: Forecasting models for admissions or supply use past data, but rarely integrate real-time testing or real-world simulations.
Deployment bottlenecks: Most studies are conceptual or retrospective; actual integration into hospital workflows is underexplored.
Contextual bias: Models trained in high-income healthcare systems often perform poorly in low-resource settings due to infrastructure mismatches.
Few comparative benchmarks: There’s limited evaluation against classical statistical or rule-based allocation methods, weakening the case for ML superiority.
Mental health
Mental health is a state of emotional wellness that allows individuals to cope with daily challenges and contribute to their community. It is complex and varies from person to person, leading to psychosocial impairments, mental disorders, and other psychological problems. ML algorithms can predict mental health and substance abuse by analysing behavioural data from various sources. By identifying potential issues and implementing targeted preventive measures, ML can help prevent mental health issues and promote overall well-being. The diagnosis and screening of mental health can be carried out in the following three ways using ML:
Predictive Modelling: ML algorithms effectively predict the incidence of mental disorders. They use EHRs, genetic information, and social media activity. ML classifiers can utilize social media posts to predict the onset of sadness. They analyze the patterns in social media posts and interactions.
Natural Language Processing (NLP): NLP techniques can be used to infer patient interviews, social media posts, or clinical notes. They help effectively analyze and detect the symptoms of depression, anxiety, etc.
Image Analysis: The brain imaging data is also utilized by ML models to detect the anomalies associated with schizophrenia, bipolar disorder, etc. These are also used to assess functional magnetic resonance imaging (fMRI) data for differentiating schizophrenic persons from normal people.
The factors depicted in Fig. 10 are responsible for the identification of mental health using ML algorithms in public health studies. These are as follows:
[See PDF for image]
Fig. 10
Mental Health Prediction
Psychological Data:
Depression and Anxiety Scores
Sleep Disorders
Stress Levels
Emotional Well-Being’s clinical records
Behavioral Data:
Physical Activity Levels
Social Interaction Patterns from mobile apps or social media
Substance Use
Diet and Nutrition
Physiological Data:
Heart Rate Variability (HRV)
Blood Pressure
Skin Conductance
Brain Activity Data
Demographic and Socioeconomic Data:
Age, Gender, and Ethnicity
Employment Status
Educational Background
Geographic Location
Environmental Data:
Air Quality
Noise Levels
Access to Green Spaces
Medical History and Comorbidities:
History of Mental Health Disorders
Chronic Illnesses
Table 6 gives a summary of the ML applications, techniques, and effectiveness in mental health research across a range of subtopics. These studies achieved moderately high accuracy and F1 scores in predicting, analyzing, or understanding different mental health aspects. A variety of issues have been covered in this table, such as the prediction of adolescent issues and monitoring subclinical conditions using sensor data. Different techniques are employed, such as natural language processing (NLP), decision trees, support vector machines (SVM), and neural networks. Evaluation of models is carried out based on accuracy, F1-score, precision, and recall. It exhibits better accuracy than traditional methods. The ability of ML algorithms to process diverse data sources is also highlighted. Social media analysis (such as Twitter and text-based digital media) helped in effective monitoring. It provides real-time insights that are not possible with traditional methods. ML models also provide personalized predictions and adapt to individual variations more effectively than generalized methods.
Table 6. ML approaches for mental health prediction and diagnosis: a comparative study
Ref No. | Objective | ML Technique | Main Findings | Accuracy | Sensitivity | Specificity | F1 Score | Dataset Used |
|---|---|---|---|---|---|---|---|---|
[109] | Predict adolescent mental health outcomes across cultures | Random Forest, SVM | Accurate prediction of adolescent mental health across diverse cultural settings | 90% | 88% | 89% | 0.88 | Multinational adolescent datasets |
[110] | Analyze text-based digital media for mental health and suicide prevention insights | NLP, Sentiment Analysis | Effective use of digital media data to predict mental health risks, especially suicide ideation | Varies | Varies | Varies | Varies | Digital media and mental health data |
[111] | Predict mental health crises using EHRs | Logistic Regression, Deep Learning | Models can predict mental health crises in advance, aiding in early intervention | 91% | 90% | 89% | 0.90 | Electronic Health Records |
[112] | Predict undesired treatment outcomes in mental health care | Random Forest, XGBoost | ML provides reliable predictions for treatment outcomes, helping personalize interventions | 89% | 88% | 87% | 0.88 | Mental health care datasets |
[113] | Explore ADHD neural mechanisms with ML | Neural Networks, CNNs | ML aids in understanding ADHD’s neural bases, potentially guiding treatment strategies | Varies | Varies | Varies | Varies | Neuroimaging datasets |
[114] | Analyze the mental health of international students | Decision Trees, KNN | Findings indicate mental health stressors unique to international students, aiding targeted support | 88% | 87% | 86% | 0.87 | Student surveys |
[115] | Address sampling bias in neuroimaging for psychiatric diagnoses | SVM, Ensemble Methods | Highlights the impact of sampling inequalities on generalization; suggests model adjustments | Varies | Varies | Varies | Varies | Neuroimaging datasets |
[116] | Understand cognitive phenotypes in HIV + patients | Clustering, Random Forest | Identifies cognitive phenotypes in HIV patients, supporting personalized cognitive care | 89% | 87% | 88% | 0.87 | Cognitive HIV datasets |
[117] | Systematic review of NLP in mental health | NLP, Sentiment Analysis, Topic Modeling | NLP tools show promise in analyzing mental health from text sources | Varies | Varies | Varies | Varies | Various mental health studies |
[118] | Model community mental health and built environment | Multilevel Models, Predictive Analytics | The built environment has measurable effects on community mental health, which is useful for policy | 89% | 88% | 87% | 0.88 | Community and environment data |
[119] | Analyze the trauma’s effect on mental health post-disaster | Gradient Boosting, SVM | Reveals heterogeneous associations of trauma with mental health issues | 87% | 85% | 86% | 0.86 | Disaster trauma datasets |
[120] | Review methodologies for monitoring mental health on Twitter | NLP, Sentiment Analysis | Reviews effective NLP methods for social media mental health monitoring | Varies | Varies | Varies | Varies | Twitter mental health datasets |
[121] | Assess self-management of mental health using wearables | Time Series Analysis, LSTM | Wearables effectively track and manage anxiety, depression, and sleep issues | 85% | 83% | 84% | 0.84 | Wearable device data |
[122] | Predict psychotherapy satisfaction | Decision Trees, Ensemble Learning | Accurate prediction of psychotherapy satisfaction levels among Chinese clients | 89% | 88% | 87% | 0.88 | Chinese psychotherapy data |
[123] | Predict life satisfaction with explainable AI | Random Forest, XAI methods | Offers insights into predictors of life satisfaction, with implications for mental health | 90% | 88% | 89% | 0.89 | Survey datasets |
[124] | Develop an adaptive data-driven architecture for mental health apps | Adaptive Models, Reinforcement Learning | Adaptive ML models improve mental health app personalization | Varies | Varies | Varies | Varies | Mental health application datasets |
[125] | Explore fairness in AI for healthcare | Fairness-aware algorithms | Discusses biases and fairness in mental health AI applications, suggesting mitigation | Varies | Varies | Varies | Varies | Healthcare AI datasets |
[126] | Predict the length of hospital stay for mental health-related fractures | Linear Regression, Decision Trees | Accurate predictions support resource allocation in hospitals | 90% | 89% | 88% | 0.89 | Orthopedic datasets |
In a nutshell, ML in mental health has shown advancements over traditional predictive and diagnostic methods. ML enables refined and scalable approaches for analysis and prediction. The integration of varied data types helps improve prediction. It also provides the capability to achieve complex analysis, which provides better interventions. These are the learnings from this subsection:
NLP and sentiment analysis effectively extracted mental health signals from social media, achieving accuracies up to 85%.
Wearable data (e.g., heart rate, fNIRS, motion sensors) fed into ML models yielded 91% accuracy in stress and anxiety detection.
CNNs and deep learning successfully classified neuroimaging data, contributing to diagnoses of ADHD and schizophrenia.
ML models personalize interventions by predicting therapy outcomes and satisfaction based on behavioral and demographic data.
Explainable AI (XAI) approaches improved the interpretability of mental health models, aiding clinician trust.
Some of the important challenges include bias in digital data, underrepresentation of vulnerable populations, and privacy issues in mental health prediction systems.
Limitations:
Subjective ground truth: Diagnoses often depend on self-reported symptoms or clinician judgment, introducing label noise.
Small datasets and non-standardized inputs: fMRI or wearable-based ML studies experience variability due to small sample sizes and diverse sensor modalities.
Cultural bias: Sentiment analysis models trained on Western social media may misclassify expressions from different cultures or languages.
Overdependence on social signals: NLP models may mistake sarcasm, irony, or slang as indicators of distress, lowering specificity.
Understanding social determinants of health (SDoH)
In this section, a very important factor called Social Determinants of Health (SDoH) and its effect on public health are discussed. SDoH refers to non-medical factors that influence health outcomes, accounting for over 50% of health outcome variance. These factors that influence health outcomes are as follows:
Income and social status
Education
Neighborhood conditions
Employment
Racism and policing
Access to healthcare and healthy food
Figure 11 depicts all these factors that come under SDoH. These factors are poorly documented in structured health records. In [127], it was found that despite growth in scientific interest, public understanding of SDoH remains low, a challenge for integrating SDoH-aware ML systems into policy or patient-facing tools. Recent advances in Natural Language Processing (NLP) have been used to extract SDoH variables from Electronic Health Records (EHRs), such as housing insecurity, employment status, food access, and transportation challenges. However, SDoH is underutilized in mental health research, with only 1.2% of studies reporting using SDoH variables. Researchers, in [128], analyzed policing as a determinant of health, especially among marginalized populations, revealing its influence on mental health, substance use, and access to care. SDoH data has been applied to understand structural inequalities in health, such as poverty, incarceration, and education gaps. Scientists used SDoH to explain county-level STI rates, showing that poverty, incarceration, and education gaps strongly predict disease burdens [129].
[See PDF for image]
Fig. 11
Different factors of the Social Determinants of Health
Despite the growth in scientific interest, public awareness and documentation gaps remain. Equity-aware algorithms in specialty medicine are needed to address these gaps and improve health outcomes.
Social determinants of health are crucial for precision public health and equitable medicine. Natural Language Processing can extract hidden SDoH, enhance patient stratification, and bridge clinical documentation gaps. Meanwhile, [130] demonstrated how pediatric and adult eye disease rates are tied to SDoH, reinforcing the need for equity-aware algorithms in specialty medicine.
Results: main findings, patterns observed, and trends, addressing the research questions
In the previous section, Tables 3, 4, 5 and 6 were presented to highlight various contributions made in different areas of Public Health. This section will discuss the observations made based on these tables and evaluate how these tables answer the research questions that were aimed at the start.
Main findings
The review highlights key contributions of machine learning to public health, as synthesized from 170 research studies:
Disease Monitoring and Prediction:
ML models (e.g., LightGBM, GRU Neural Networks) achieved high accuracy in predicting diseases such as coronary heart disease (92%) and chronic kidney disease (89%).
Algorithms demonstrated success in forecasting disease outbreaks (e.g., COVID-19, dengue) using epidemiological, demographic, and environmental data.
Personalized Medicine:
Genomic data analysis using ML revealed significant advancements in personalized treatments.
Applications like genomic clustering and radiomics enhance cancer treatment and prognosis prediction.
Pharmacogenomics leveraged ML to tailor drug therapies to genetic profiles, reducing adverse reactions.
Mental Health:
Sentiment analysis using natural language processing (NLP) on social media helped monitor mental health trends.
ML-assisted tools used physiological data from wearables for stress detection and relapse prevention.
Techniques like feature extraction and fMRI analysis identified neural markers for conditions such as ADHD and bipolar disorder.
Resource Allocation and Optimization:
ML models facilitated optimal distribution of resources, such as ventilators and ICU beds, during emergencies.
Forecasting hospital admissions and emergency visits improved resource readiness and allocation.
Interpretation of results
The findings from the research paper underscore the transformative potential of machine learning in addressing public health challenges. Key interpretations include:
Enhanced Predictive Accuracy:
Across multiple domains, ML models consistently outperform traditional statistical and heuristic methods. For example:
LightGBM achieved 92% accuracy in coronary heart disease prediction.
GRU Neural Networks outperformed older models in predicting HIV incidence with a sensitivity of 85%.
Disease Prediction:
Structured datasets like medical imaging or genomic data drive high-accuracy models (e.g., 95% for skin cancer detection).
Focus on early detection and prevention.
Mental Health Monitoring:
Relies on unstructured data (e.g., social media posts, speech patterns) and physiological metrics.
Emphasizes personalization and early intervention through NLP and wearable technologies.
Broad Applicability:
ML finds applications in a wide range of public health issues, from genomic data analysis to mental health monitoring. These capabilities illustrate ML’s flexibility in dealing with diverse and complex datasets.
Domain-Specific Strengths:
Genomic Data Analysis: Achieves groundbreaking insights through clustering and high-dimensional data processing, addressing diseases at a molecular level.
Mental Health: Exploits social and behavioral data for non-invasive monitoring and treatment, an area traditionally underexplored with computational methods.
Personalized Medicine: There are also real-life case studies [127, 128–129] that verify the claims of using ML for mental health prediction
Advances are rooted in genomic data and pharmacogenomics, offering tailored treatment plans.
Primarily patient-centric, aiming to optimize individual outcomes.
Resource Allocation:
Focuses on population-level benefits, optimizing hospital beds, staff, and medical supplies.
Utilizes predictive modeling to prepare for future healthcare demands.
Emerging Trends:
The integration of real-time data sources (e.g., IoT, wearable devices) and the adoption of explainable AI models reflect the ongoing evolution of ML technologies to meet public health needs.
Real-time resource allocation during emergencies (e.g., COVID-19) demonstrates the feasibility of integrating ML into operational decision-making.
Disease outbreak monitoring is transitioning from retrospective analysis to proactive, real-time surveillance using IoT and mobile data.
The research findings demonstrate that ML applications in public health are both transformative and domain-specific. While disease prediction and personalized medicine showcase impressive technical advances, resource allocation and mental health monitoring emphasize operational and ethical challenges. Future efforts should focus on bridging gaps in data availability, enhancing model generalization, and addressing biases to ensure equitable and effective implementation across diverse populations.
Advantages of machine learning in public health
This section discusses the advantages of using machine learning to predict different public health outcomes.
Enhance predictive precision
Deep learning (DL) algorithms are a subset of ML. They can forecast medical results with very high accuracy. These DL models are suitable for analyzing complex, multifaceted data. They can reveal hidden patterns and provide information that cannot be obtained easily. DL models achieve high accuracy in disease detection by analyzing medical images, exemplified by the identification of diabetic retinopathy from retinal scans.
Improved productivity and effectiveness
ML algorithms minimized the time and effort required to conduct public health research. It has happened through the automation of data analysis. During COVID-19, many organizations have employed AI/ML to create the vaccine, and the use of these algorithms has reduced the development time to 1 year from tens of years. These techniques can handle large datasets. It supports the quick creation of knowledge and enables timely public health interventions.
Another major advantage is that ML models enable real-time surveillance and automated data processing. By continuously monitoring and analyzing the data, these models can provide prompt and real-time observation of disease outbreaks and additional public health threats. Improving the processing of large datasets by ML models facilitates the efficient and optimal allocation of staff and resources to crucial tasks.
Efficiency in terms of cost
This results in significant cost reductions through improved resource allocation, decreased hospital readmissions, and prevention of disease outbreaks.
Hospital readmission reduction: By using ML models for identifying patients with a high risk of readmission. Specific therapies can be administered to decrease readmission rates and cut down their associated costs significantly.
Efficient Resource Allocation: Prediction using ML algorithms leads to the optimum distribution of medical supplies and staff allocation. As a result, resource utilization improves, and healthcare expenses are reduced.
Challenges in integrating ML into public health
Though ML algorithms have provided efficient solutions to improve the efficacy of public health measures, a long journey remains to overcome the many challenges that present themselves in their way. In the coming subsections, the challenges faced in the integration of ML in the different public health dimensions discussed earlier in this work will be discussed. These challenges are as follows:
Challenges faced in utilizing ML models in disease outbreak
Data challenges
Data Quality and Completeness: Many datasets are incomplete, noisy, or biased. For instance, electronic health records and social media data often contain missing or unstructured entries, complicating the training of machine learning models [55, 59].
Data Integration Across Sources: Combining diverse data streams (e.g., genomic, clinical, and social media data) for disease surveillance is complex due to differing formats, quality, and scales [56, 59].
Underrepresentation in Data: [50] emphasizes that marginalized regions and populations are underrepresented in outbreak data, leading to models that may not generalize well to global contexts.
Timeliness and real-time predictions
Delays in Data Availability: While social-media and internet-based surveillance systems are timely, they can be affected by delays in reporting accurate outbreak data or spurious trends [57, 58].
Dynamic Nature of Diseases: Infectious diseases evolve rapidly, requiring models to adapt to new strains or mutations [53], 62.
Model accuracy and interpretability
Overfitting and Generalization: Models trained on localized data often struggle to generalize across regions or timeframes, as seen in the case of COVID-19 predictions in specific Indian states [51, 60].
Interpretability Issues: Black-box ML models, such as deep learning, lack transparency, making it difficult for public health officials to trust and act on their predictions [61, 63].
Predictive power and granularity
Lack of Granular Data: Early warning systems for diseases like dengue and malaria often lack granular demographic or geospatial data, reducing their effectiveness for specific interventions [52, 54].
Sparse Event Data: Predicting rare outbreaks (e.g., Legionella pneumophila) is challenging due to the lack of extensive historical data [56].
Ethical and privacy concerns
Data Privacy Risks: Mining social media and online health records raises ethical concerns about privacy, especially when data sharing is required for cross-border outbreak management [57, 59].
Algorithmic Fairness: Unequal representation in datasets can lead to biased predictions that may exacerbate healthcare disparities [50, 63].
Real-world deployment and scalability
Integration with Public Health Systems: ML models often fail to integrate seamlessly with traditional public health surveillance systems, limiting their real-world applicability [53, 59].
Resource Limitations in Low-Income Settings: Effective ML implementation for outbreak prediction often requires computational resources and technical expertise that may not be available in resource-constrained settings [50].
Reliability of non-traditional data sources
Noise in Social Media Data: Social media surveillance systems can produce false positives due to unrelated trending topics or misinformation, as observed in studies on influenza and mass gatherings [57, 58].
Inconsistent Reporting: Internet-based systems depend on the accuracy of online reporting, which varies across platforms and users [59, 61].
Adaptability to emerging threats
Novel Pathogens: Models are often trained on historical data, limiting their ability to predict outbreaks caused by newly emerging pathogens [53, 60].
Rapid Evolution of Diseases: The emergence of new strains or antibiotic-resistant pathogens poses challenges for existing prediction models to remain effective [56, 62].
Limited collaboration and standardization
Lack of Standardized Methodologies: Alfred and Obit (2021) [50] highlight the absence of standardized protocols for using ML in outbreak prediction, which hampers collaboration and model comparison.
Interdisciplinary Coordination: Effective outbreak prediction requires coordination between epidemiologists, data scientists, and public health officials, which is often lacking [52, 53].
Challenges faced in the use of ML for resource allocation and optimization
In low-resource settings, implementing machine learning models for resource allocation and optimization in hospitals and public health presents several unique challenges:
Data challenges
Limited Availability: Hospitals in resource-constrained settings often lack comprehensive data on patients, facilities, or equipment due to inconsistent record-keeping or lack of digital infrastructure.
Data Quality: Available data might be incomplete, outdated, or inconsistent, reducing the reliability of ML predictions and recommendations.
Privacy Concerns: Poorly developed data protection frameworks may expose sensitive health information to misuse, discouraging data sharing.
Infrastructure constraints
Hardware and Software Limitations: Many ML models require advanced computing resources that are unavailable in underfunded hospitals.
Internet Connectivity: Unreliable or absent internet access may hinder cloud-based ML models or data sharing between institutions.
Power Supply: Intermittent electricity can disrupt ML system training and operation, particularly in rural hospitals.
Human resource issues
Skill Gaps: Hospitals in these settings may lack personnel trained in ML development, deployment, and maintenance.
Dependence on External Expertise: Reliance on external vendors or international organizations for ML systems can limit scalability and customization.
Operational challenges
Dynamic Environments: Public health crises, such as pandemics, create rapidly changing conditions that require real-time model updates, which may not be feasible in low-resource contexts.
Scalability: Customizing ML models to fit the diverse needs of various hospitals or health programs in a region is complex and resource-intensive.
Cost-related issues
High Upfront Costs: Acquiring and implementing ML systems may be prohibitively expensive for hospitals with limited budgets.
Sustainability: Ongoing costs for software updates, hardware maintenance, and personnel training can strain budgets over time.
Ethical and social barriers
Trust Issues: Patients and healthcare workers may distrust AI-based decisions, especially when they replace human judgment in critical health scenarios.
Equity Concerns: ML models designed for high-resource settings may not address the unique needs or disparities in low-resource environments, exacerbating inequities.
Regulatory gaps
Lack of Policy Support: Weak governance and the absence of guidelines for AI in healthcare may lead to poor implementation, misuse, or abandonment of ML tools.
Mitigation strategies
Addressing these challenges requires a holistic approach that balances technological innovation with the realities of low-resource environments. The following steps can be taken to mitigate these challenges:
Simplified ML Models: Develop models that are computationally lightweight and tailored for low-resource settings.
Decentralized Systems: Use local computation or federated learning to reduce reliance on centralized cloud systems.
Capacity Building: Train local staff to understand, use, and maintain ML models, ensuring long-term sustainability.
Public–Private Partnerships: Leverage partnerships to share the financial burden and introduce cutting-edge technology.
Pilot Projects: Start with smaller, scalable implementations to demonstrate feasibility and build trust within the healthcare ecosystem.
Challenges in utilizing ML models for genomic data analysis
The following are the challenges faced in the utilization of ML models for genomic data analysis:
High dimensionality and data complexity
Curse of Dimensionality: Genomic datasets are often massive, with millions of genomic variants or features requiring dimensionality reduction or feature selection to mitigate computational challenges [74, 80].
Complex Data Structures: Interpreting non-linear interactions between genetic elements, such as regulatory regions and transcription factors, complicates model development [75, 88].
Data imbalance and population bias
Unequal Representation of Classes: Diseases caused by rare genetic variants are underrepresented, leading to poor model performance on minority classes [76, 80].
Population Stratification: Models often lack generalizability across diverse ancestries due to the overrepresentation of European genomic data [76, 81].
Model interpretability
Black-Box Models: Deep learning and ensemble methods, although powerful, are difficult to interpret, which is problematic for clinical decision-making [86, 88].
Lack of Transparency: Understanding the biological mechanisms underlying model predictions remains a challenge [84, 85].
Integration of multimodal data
Heterogeneous Data Sources: Combining genomic, clinical, and environmental data is technically challenging and often results in data compatibility issues [78, 84].
Data Preprocessing: Harmonizing diverse datasets for ML applications requires extensive preprocessing, such as normalization and imputation [79, 87].
Generalizability of models
Overfitting: Many ML models are overfitted to training datasets, reducing their ability to predict outcomes in external datasets [81, 89].
Benchmarking Across Scenarios: Lacking consistent benchmarking frameworks leads to varied performance evaluations across studies [89].
Scalability and computational requirements
Large-Scale Data Analysis: Processing high-dimensional genomic data requires significant computational power and optimized algorithms [74, 90].
Cost of Computation: Advanced ML methods, such as deep learning, demand high-performance computing infrastructure, which can be prohibitive [81].
Ethical and privacy concerns
Data Privacy: Genomic data contains sensitive information, necessitating robust privacy-preserving methods to ensure confidentiality [78, 87].
Informed Consent: Issues around the secondary use of genomic data and consent for ML applications pose ethical challenges [76, 88].
Bias in feature selection and model development
Algorithmic Bias: Feature selection and modeling strategies may introduce biases, limiting the identification of biologically relevant markers [75, 83].
Underutilization of Advanced Techniques: Techniques like entropy-based third-order interaction analysis are underexplored but necessary for more accurate predictions [88].
Validation and functional insights
Experimental Validation: ML-based predictions lack functional validation, hindering their translation into actionable biological insights [75, 83].
Reproducibility: Variability in data handling and modeling pipelines reduces the reproducibility of findings [81, 85].
Real-world applications and limitations
Disease Complexity: Identifying genomic signatures for complex diseases like autism, schizophrenia, and Alzheimer’s remains challenging due to their polygenic nature [80, 89].
Clinical Translation: Many models fail to bridge the gap between research and clinical applications due to differences in requirements for accuracy, interpretability, and scalability [78, 82].
By addressing these challenges, future ML models can better harness the power of genomic data for advancing personalized medicine and biological discovery.
Challenges faced in integrating ML for mental health
In mental health, the integration of ML algorithms faces the following challenges:
Data challenges
Sampling Bias and Inequality: As highlighted by [115], inequalities in sampling can significantly impact the generalization of neuroimaging-based classifiers, leading to models that perform poorly across diverse populations. This reflects the broader issue of underrepresentation in datasets, particularly for marginalized groups in mental health studies.
Data Scarcity in Context-Specific Scenarios: [114] emphasizes that unique stressors influence international students’ mental health, but limited context-specific data hinders model training and validation, reducing applicability across populations.
High Dimensionality and Noise: In [111], it has been noted that electronic health records (EHRs) often contain noisy and unstructured data, complicating the development of robust predictive models for mental health crises.
Ethical and privacy concerns
Algorithmic Fairness: According to [115], fairness in AI is a critical concern, as biases in training data can lead to discriminatory outcomes, potentially exacerbating mental health disparities.
Data Privacy: The sensitive nature of mental health data makes it challenging to share and integrate datasets across studies, limiting the development of more generalizable machine learning models [111, 118].
Complexity of mental health conditions
Heterogeneity in Outcomes: [119] demonstrates that the associations between traumatic experiences and mental health problems are highly heterogeneous, making it difficult for models to provide accurate predictions for diverse populations.
Multifactorial Influences: Authors in [109] discuss how adolescent mental health outcomes are shaped by a mix of biological, cultural, and social factors, posing challenges for machine learning models to capture these intricate dynamics comprehensively.
Generalization issues
Cross-Cultural Variability: [109] underscores the difficulty of generalizing ML predictions across cultural contexts, as mental health expressions and influences vary widely.
Overfitting to Specific Data Sources: [118] note that predictive models often overfit to localized environmental and social variables, reducing their utility in broader applications.
Interpretability and usability
Black-Box Nature of Models: Many machine learning models, especially deep learning ones, lack transparency, making it difficult for clinicians to trust and adopt these tools [112].
Mismatch with Clinical Needs: As highlighted in [121], models often fail to integrate seamlessly into clinical workflows, limiting their practical utility in managing common mental health disorders.
Assessment of non-traditional data sources
Text and Social Media Analysis: While [110, 120] highlight the potential of text-based media and Twitter for mental health insights, these approaches face challenges in accurately interpreting context and intent, which are critical in understanding mental health expressions.
Wearable Devices: [121] point out that data from wearable devices often lack sufficient granularity or consistency to support reliable mental health interventions.
Outcome prediction and treatment personalization
Prediction Limitations: [122] discusses how predicting client satisfaction with psychotherapy is fraught with variability due to subjective factors, emphasizing the need for explainable AI to enhance trust.
Undesired Treatment Outcomes: [112] identify challenges in predicting negative treatment outcomes due to the interplay of psychological and social factors, which ML models struggle to quantify effectively.
Integration with neural mechanisms
Understanding Neural Correlates: [113] highlights that ML’s role in ADHD research is limited by an incomplete understanding of underlying neural mechanisms, which hampers model effectiveness in guiding treatment.
By addressing these challenges, the field can work toward more equitable, effective, and interpretable machine learning applications in mental health.
Ensuring the accuracy and confidentiality of data
Data Quality is the most important factor that affects the efficacy of ML models. It must be accurate, adequate, and unbiased so that erroneous predictions and unexpected outcomes may be avoided—Data Quality Issues: Insufficient data quality can hamper the accuracy of ML models. Therefore, the primary concern is to ensure data integrity and accuracy to forecast reliably. Issues regarding the protection of personal information: The privacy of personal health data remains at the forefront when utilizing it for predictions. The biggest challenge is keeping it secure and maintaining the accuracy and reliability of the ML model. Strict legal frameworks and the use of data anonymization techniques can successfully address these issues.
Many ML models, particularly DL algorithms, operate as “black boxes,” limiting our understanding of their decision-making processes. This lack of transparency might prevent the development of trust and the execution of public health policy.
Black Box Models: The complex design of deep learning models frequently prevents the understanding of the mechanisms by which they produce particular predictions. Developing comprehensible models is essential for gaining acceptability in therapeutic settings, as they offer significant insights into underlying mechanisms.
Explainable AI (XAI) is a discipline dedicated to improving the clarity and comprehension of ML models. This allows healthcare providers to understand and depend on the forecasts generated by these models.
Ethical and regulatory concerns
Ethical and technical challenges:
The accuracy and confidentiality of data remain critical challenges.
Ethical concerns, including algorithmic biases, highlighted the need for fairness-aware ML models.
The application of machine learning in public health raises ethical concerns about equity, accountability, and potential prejudice. Regulatory frameworks are crucial for assuring the ethical use of machine learning, protecting patients’ rights, and mitigating discriminatory practices.
Algorithmic Bias: Sometimes, ML models accidentally increase biases present in the training data, leading to inequitable and biased results. Managing biases and guaranteeing fairness is essential for the ethical application of ML.
Regulatory Frameworks: Effective regulatory frameworks are crucial for supervising the implementation of ML in healthcare, guaranteeing the ethical use of algorithms, and safeguarding patient rights. These frameworks must specifically focus on data protection, model transparency, and accountability.
Table 7 provides an overview of the different challenges encountered in the use of ML in public health. These challenges affect the efficacy and absorption of the technology. There are concerns regarding data privacy and security, in addition to the dangers of breaches, legal complications, and potential erosion of patient trust. Inadequate or inconsistent data gives rise to issues related to data quality and availability. It undermines model accuracy. Algorithmic bias and fairness are other challenges that can lead to discrimination and disparities among groups. The interpretability of models, especially in deep learning, prevents effective decision-making and regulatory clearances. The integration with existing systems is impeded by compatibility concerns with outdated public health systems, delaying implementation. The absence of standardization in data methods and model training results in variable performance and challenges in collaboration. Ethical issues, including data ownership and automated decision-making, pose a risk to public reaction. The scale and infrastructural constraints of public health systems impede the implementation of machine learning techniques, while regulatory and compliance issues introduce delays and legal complications. Financial and resource limitations hinder adoption, especially in resource-constrained environments. Model adaptability concerns occur when models trained on specific datasets exhibit poor performance in different populations, hence diminishing their efficacy. Human considerations and acceptability issues stemming from distrust or unfamiliarity among experts hinder adoption—the demands of real-time data processing present obstacles for machine learning, affecting prompt actions in public health. Challenges in data governance hinder inter-institutional data sharing, while insufficient evaluation and validation of models elevate the risks of inefficient or detrimental applications.
Table 7. Challenges in the use of machine learning in public health
Challenge | Description | Impact | Ref No. |
|---|---|---|---|
Data Privacy and Security | Ensuring patient data is protected and used in compliance with regulations (e.g., HIPAA, GDPR) | Risk of data breaches, legal issues, and loss of patient trust | [131] |
Data Quality and Availability | Public health data is often incomplete, inconsistent, or unstructured | Reduces the accuracy and reliability of ML models | [132] |
Bias and Fairness | Algorithms may inherit biases from training data, leading to unfair or discriminatory outcomes | Misrepresentation of certain populations leads to inequities | [133] |
Interpretability of Models | Many ML models, especially deep learning, are complex and difficult to interpret | Challenges in decision-making and regulatory approval | [134] |
Integration with Existing Systems | Difficulty in integrating ML models with legacy public health information systems | Slows down implementation and reduces effectiveness | [135] |
Lack of Standardization | Absence of standardized practices for data collection, model training, and validation | Results in varying model performance complicate collaboration | [136] |
Ethical Concerns | Issues around consent, data ownership, and the ethical implications of automated decision-making | Potential ethical violations and public backlash | [137] |
Scalability and Infrastructure | Public health systems may lack the computational resources to deploy and scale ML solutions | Restricts the extensive deployment of ML tools | [138] |
Regulatory and Compliance Challenges | Ensuring that ML models comply with stringent public health regulations and guidelines | Delayed adoption, along with prospective legal ramifications | [139] |
Cost and Resource Constraints | High costs are associated with developing, deploying, and maintaining ML models | Restricts the use of ML, particularly in resource-constrained environments | [140, 141] |
Generalizability of Models | Models trained on specific datasets may not generalize well to other populations or regions | Reduces the model’s effectiveness across different contexts | [142] |
Human Factors and Acceptance | Resistance from healthcare professionals and public health officials due to a lack of understanding or trust in ML | Impedes the use and incorporation of ML in practice | [143] |
Real-Time Data Processing | Public health often requires real-time analysis and intervention, which can be challenging for ML models to achieve | Prolonged reaction times negatively impact public health outcomes | [144] |
Data Governance | Controlling, sharing, and utilizing data across agencies and governments is challenging | Hinders collaboration and data-driven decision-making | [145] |
Evaluation and Validation | Guaranteeing that ML models undergo thorough testing and validation in public health environments | Boosts the probability of employing ineffective or detrimental models | [146] |
Future directions
Upon the analysis of the tables presented in different sections, it can be observed that there are four main trends visible in the use of ML in public health. These can be summarized as follows:
Emergence of Explainable AI: There is a growing emphasis on interpretable models to improve trust and applicability in public health. Explainable AI denotes the application of methodologies such as SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) to improve the comprehensibility of machine learning models. These techniques assist public health officials in comprehending and having confidence in the predictions made by these models.
Shift to Real-Time Applications: Real-time disease surveillance systems integrating IoT and ML are becoming prevalent. Wearable devices and smart sensors can continuously track health measurements and environmental factors, providing real-time data for machine learning models to detect early signs of disease epidemics.
Increased Focus on Equity: ML models are addressing health disparities by identifying at-risk populations and improving healthcare access.
Multi-Omics Integration: Combining genomic, proteomic, and environmental data is improving disease prediction and treatment strategies. It can lead to improved treatment efficacy and effectiveness.
Apart from these trends, many innovations are making public health solutions more effective by integrating technology in this area. The incorporation of machine learning with other technologies, such as the Internet of Things (IoT) and blockchain, shows potential for improving disease surveillance systems. Continuous health monitoring can be achieved by real-time data collected from wearable devices and smart sensors. The confidentiality and safety of this data can be guaranteed through the application of blockchain technology. Moreover, advancements in explainable artificial intelligence (AI) aim to improve the transparency and comprehensibility of machine learning models, hence facilitating their application in public health decision-making.
Employing blockchain technology can efficiently protect health data, ensuring its integrity and confidentiality. This methodology can instill confidence in machine learning systems and facilitate data sharing among stakeholders.
Table 8 presents the future directions in which the use of ML in public health is moving. These emerging directions in machine learning promise significant advancements in public health. Personalized public health interventions aim to improve health outcomes by tailoring interventions to individual profiles. Multi-omics data integration offers more precise disease predictions and personalized treatments, while real-time surveillance can prevent outbreaks by analyzing diverse, real-time data sources. Explainable AI improves adoption by making ML decisions transparent for healthcare professionals, and AI-powered health equity solutions work to mitigate healthcare disparities.
Table 8. Future directions for the use of machine learning in public health
Future Direction | Description | Potential Impact | Ref No. |
|---|---|---|---|
Personalized Public Health Interventions | Developing ML models that tailor public health interventions to individuals based on their unique health profiles and risk factors | Improved effectiveness of interventions and better health outcomes | [68] |
Integration of Multi-Omics Data | Leveraging genomic, proteomic, and other omics data to enhance predictive models for disease prevention and management | More accurate disease predictions and personalized treatment plans | [138, 147] |
Real-Time Surveillance and Outbreak Prediction | Using ML to analyze real-time data from multiple sources (e.g., social media, health records) to predict and manage outbreaks before they spread | Faster response to emerging public health threats and reduced disease spread | [148] |
Explainable AI for Public Health | Developing interpretable ML models that provide clear explanations for their predictions to enhance trust and usability among healthcare professionals | Increased adoption of AI tools in public health and better decision-making | [149] |
AI-Powered Health Equity Solutions | Addressing health disparities by using ML to identify and mitigate biases in healthcare delivery and outcomes | Improved health equity and access to care for underserved populations | [150] |
Federated Learning for Collaborative Research | Implementing federated learning approaches to enable the sharing and analysis of public health data across institutions while preserving privacy | Enhanced collaboration and more robust public health insights without compromising data privacy | [151, 152] |
Advanced Disease Risk Modeling | Developing more sophisticated risk prediction models that incorporate environmental, social, and behavioral factors | A more comprehensive understanding of disease risk and prevention strategies | [153] |
AI in Public Health Policy Making | Leveraging ML to simulate and analyze the potential outcomes of public health policies before implementation | Data-driven policy decisions lead to more effective public health strategies | [154, 155] |
AI-Driven Mental Health Monitoring | ML can be used to monitor mental health trends and provide early intervention through digital platforms | Early detection and prevention of mental health crises can reduce long-term impacts | [156, 157] |
Cross-Disciplinary Collaboration | Encouraging collaboration between public health experts, data scientists, and AI researchers to develop innovative ML solutions | Accelerated innovation and more practical ML applications in public health | [137] |
Ethical Frameworks for AI in Public Health | Establishing guidelines and ethical frameworks for the responsible use of AI in public health, ensuring fairness, transparency, and accountability | Increased public trust in AI technologies and ethical implementation | [158, 159] |
Scalable AI Solutions for Low-Resource Settings | Developing ML models that can be effectively deployed in low-resource settings with limited data and infrastructure | Expanded access to advanced public health solutions in underserved regions | [160] |
Wearable Technology and ML Integration | Utilizing data from wearable devices to enhance ML models for continuous health monitoring and personalized public health interventions | Improved monitoring of public health at the individual level, leading to proactive health management | [161] |
Longitudinal Health Data Analysis | Applying ML to analyze long-term health data to understand the progression of diseases and the long-term impact of interventions | Better long-term health outcomes through informed public health strategies | [162] |
AI for Global Health Security | Leveraging ML to enhance global health security by predicting and responding to pandemics and other large-scale health threats | Improved global readiness and response to public health emergencies | [163] |
Federated learning fosters collaboration across institutions by enabling data sharing without compromising privacy, and advanced disease risk modeling incorporates diverse risk factors for a deeper understanding of health risks. ML can also support public health policymaking through simulation analysis, leading to better-informed decisions. AI-driven mental health monitoring aims to prevent mental health crises, while interdisciplinary collaboration enhances innovation by integrating public health and AI expertise. Ethical frameworks facilitate the responsible application of AI, thereby enhancing public trust. Further developments encompass scalable AI solutions tailored for low-resource environments, facilitating wider accessibility; integration of wearable technology for real-time health monitoring; longitudinal data analysis to evaluate disease progression; and the application of AI in global health security, enhancing preparedness for pandemics and significant health threats. These approaches are expected to enhance medical outcomes, availability, and global health resilience.
At the end of this section, a comparative evaluation of this work is provided against the existing surveys (Tables 9 and 10). The survey provided a detailed analysis of different biases and prediction metrics in public health. It reports 72 articles from 2008 to 2023. The review provides quality meta-analysis and descriptions of the biases and prediction metrics.
Table 9. A comparison of this work with existing surveys
Study | Scope and domain | Strengths | Gaps/Limitations |
|---|---|---|---|
This work | Broad: disease prediction, genomics, resource allocation, mental health | Integrates ethical + technical lens; spans > 170 studies; systems-level insights | Narrative only (no meta-analysis); lacks quantitative synthesis; interpretability challenges |
[164] | Critical success factors for sustainable AI in Saudi healthcare | Emphasis on implementation feasibility; aligns with Saudi Vision 2030; regulatory insights | Region-specific; limited algorithmic diversity |
[165] | Obesity prediction from cohort data | Quantitative synthesis of cohort-based ML studies; evaluates accuracy and generalizability | Disease-specific; no broader public health applications |
[166] | Trends in AI/ML in pathology | Forecasts diagnostic AI evolution; emphasizes automation | Pathology-specific; no ethical synthesis |
[167] | Ethical implications of race data use in AI | Addresses fairness, bias, and risk of discrimination in ML | Lacks technical implementation context |
[168] | AI in congenital heart interventions | Clinical utility focus, pediatric cardiology emphasis | Narrow domain; lacks public health integration |
[169] | AI in-patient rehabilitation | Discusses sensor-based monitoring and ML in physical therapy | Narrative scope: fewer quantitative metrics |
[170] | Bias in ML models in medicine | Deep dive into data, algorithmic, and interaction bias | Focused on pathology; no policy-level recommendations |
[171] | Federated learning with a focus on privacy, security, and adversarial threats | In-depth bibliometric analysis highlights global trends and key contributors | Technical/systems focus; does not explore ethical, clinical, or practical deployment in real-world healthcare |
[172] | Fairness in ML for public health equity | Detailed taxonomy of bias (algorithmic, data, social); identifies fairness metrics (e.g., F1, disparate impact) | Limited scope to fairness metrics; lacks model performance benchmarks or interventions |
[173] | Explainable AI (XAI) and medical negligence in Ghana | Implementation-focused; aligns ML training with local legal frameworks (Public Health Act 851) | Region-specific; lacks evaluation of model performance or broader generalizability |
Table 10. Thematic Comparison of this work with other surveys
Parameters | This review | Other 2025 reviews |
|---|---|---|
Model Accuracy | Up to 95% (deep learning in cancer/genomics) | 78–92% ([165] on obesity; [168] on CHD; [166] on pathology) |
Scope Breadth | Multidomain: disease, mental health, genomics, ethics, infrastructure | Mostly domain-specific (e.g., CHD, obesity, pathology) |
Ethics and Fairness | Extensive: algorithmic bias, equity, privacy | Strong in Fiske et al., [167] (ethics)—but not across all |
Quantitative Meta-Analysis | narrative only | [165] (cohort synthesis); [164] (qualitative) |
Frameworks and Trends | PRISMA-style screening, future trends, XAI, wearable integration | Focused on specific use cases or technologies |
Geographic or Policy Relevance | Global and systems-level (post-COVID) | [164] (Saudi-specific policy), others mostly clinical research [173], Ghana-specific |
While the [174] article provides a deep dive into Deep Reinforcement Learning (DRL) for epidemic control, the presented work offers a more generalized view of ML applications, including but not limited to DRL, in public health. While current work and [175] both discuss ML in disease prediction, this work offers a broader perspective, encompassing non-infectious diseases and a wider range of public health applications.
Finally, the study is able to answer all the research questions it was meant to investigate.
The research effectively answers its central question by demonstrating how ML enhances public health.
ML has been utilized for different applications across domains. ML techniques have been shown to significantly benefit disease prediction, mental health, genomic analysis, and resource optimization.
It demonstrates how ML models improved the predictive accuracy and provided actionable insights for personalized interventions.
It has also provided directions for the future. Emerging technologies, such as explainable AI and federated learning, promise to address current challenges and expand ML applications.
The study provides a comprehensive roadmap for advancing public health with machine learning. It systematically identifies the areas where ML has shown promise and highlights trends such as ethical considerations and model transparency.
Conclusion
Machine learning is increasingly becoming an integral part of public health. It offers a diverse range of predictive and diagnostic applications. It has provided the enhanced capability to detect illness and adapt medical treatments for specific patients. By optimal resource allocation and understanding health behaviors, ML has further empowered Public Health. Machine learning-driven models have demonstrated considerable benefits in enhancing the identification of illnesses, epidemic readiness, genomic data analysis, and resource management, particularly in terms of speed, accuracy, and scalability. These models facilitate more efficient public health tactics and measures through the identification of threats and the optimization of resource allocation. Advancements in machine learning for personalized medicine, mental health, and maternal and child care illustrate the prospects for customized, data-driven healthcare solutions. It is essential to address challenges associated with data quality, privacy concerns, analytical capacities, and ethical considerations to leverage its potential fully. To take full advantage of the potential of machine learning in public health, it is important to continue research, foster cooperation among public health experts and data scientists, and establish strong regulatory frameworks.
Author contributions
Methodology, SSD, and DP; Conceptualization, SSD, DP, CCL, TKS, DR, SB, AS, YHL, NA, and SA; Original Draft Preparation: SS, and DP; Review and Editing: SSD, DP, CCL, TKS, DR, SB, AS, YHL, NA, and SA; Visualization: SSD, DP, CCL, TKS, DR, SB, AS, YHL, NA, and SA; Funding Acquisition: CCL, and YHL. All authors reviewed the manuscript.
Funding
This work is partially supported by funding from Hon Hai Research Institute, National Science and Technology Council under Grant NSC 113-2410-H-167-012-MY3.
Availability of data and materials
No datasets were generated or analysed during the current study.
Declarations
Competing interests
The authors declare no competing interests.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Roberts, MC; Holt, KE; Del Fiol, G et al. Precision public health in the era of genomics and big data. Nature Med; 2024; 30, pp. 1865-1873. [DOI: https://dx.doi.org/10.1038/s41591-024-03098-0]
2. Ashrafian, H; Darzi, A. Transforming health policy through machine learning. PLoS Med; 2018; 15,
3. Beam, AL; Kohane, IS. Big data and machine learning in health care. JAMA; 2018; 319,
4. Topol, EJ et al. High-performance medicine: the convergence of human and artificial intelligence. Nat Med; 2019; 25,
5. Esteva, A et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature; 2017; 542,
6. Miotto, R et al. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep; 2016; [DOI: https://dx.doi.org/10.1038/srep26094]
7. Huang, J-D; Wang, J; Ramsey, E; Leavey, G; Chico, TJA; Condell, J. Applying artificial intelligence to wearable sensor data to diagnose and predict cardiovascular disease: a review. Sensors; 2022; 22,
8. Zhang, Z. Early warning model of adolescent mental health based on big data and machine learning. Soft Comput; 2024; 28, pp. 811-828. [DOI: https://dx.doi.org/10.1007/s00500-023-09422-z]
9. Jiang, Z; Van Zoest, V; Deng, W; Ngai, ECH; Liu, J. Leveraging machine learning for disease diagnoses based on wearable devices: a survey. IEEE Internet Things J; 2023; 10,
10. Yang, H; Chen, Z; Yang, H; Tian, M. Predicting coronary heart disease using an improved LightGBM model: performance analysis and comparison. IEEE Access; 2023; 11, pp. 23366-23380. [DOI: https://dx.doi.org/10.1109/ACCESS.2023.3253885]
11. Li, X; Xu, X; Wang, J; Li, J; Qin, S; Yuan, J. Study on prediction model of HIV incidence based on GRU neural network optimized by MHPSO. IEEE Access; 2020; 8, pp. 49574-49583. [DOI: https://dx.doi.org/10.1109/ACCESS.2020.2979859]
12. Davi, C et al. Severe dengue prognosis using human genome data and machine learning. IEEE Trans Biomed Eng; 2019; 66,
13. Zeng, Z; Deng, Y; Li, X; Naumann, T; Luo, Y. Natural language processing for EHR-based computational phenotyping. IEEE/ACM Trans Comput Biol Bioinf; 2019; 16,
14. Cui, S; Li, C; Chen, Z; Wang, J; Yuan, J. Research on risk prediction of dyslipidemia in steel workers based on recurrent neural network and LSTM neural network. IEEE Access; 2020; 8, pp. 34153-34161. [DOI: https://dx.doi.org/10.1109/ACCESS.2020.2974887]
15. Imran, A; Nasir, A; Bilal, M; Sun, G; Alzahrani, A; Almuhaimeed, A. Skin cancer detection using combined decision of deep learners. IEEE Access; 2022; 10, pp. 118198-118212. [DOI: https://dx.doi.org/10.1109/ACCESS.2022.3220329]
16. Liu, P; Jin, K; Jiao, Y; He, M; Fei, S. Prediction of second primary lung cancer patient’s survivability based on improved eigenvector centrality-based feature selection. IEEE Access; 2021; 9, pp. 55663-55672. [DOI: https://dx.doi.org/10.1109/ACCESS.2021.3063944]
17. Lin, X et al. A case-finding clinical decision support system to identify subjects with chronic obstructive pulmonary disease based on public health data. Tsinghua Sci Technol; 2023; 28,
18. Yang, H; Chang, J; He, W; Wee, CF; Yit, JS; Feng, M. Frailty modeling using machine learning methodologies: a systematic review with discussions on outstanding questions. IEEE J Biomed Health Inform; 2024; [DOI: https://dx.doi.org/10.1109/JBHI.2024.3430226S]
19. Siddiqui, H et al. A survey on machine and deep learning models for childhood and adolescent obesity. IEEE Access; 2021; 9, pp. 157337-157360. [DOI: https://dx.doi.org/10.1109/ACCESS.2021.3131128]
20. Huang, M et al. Joint-channel-connectivity-based feature selection and classification on fNIRS for stress detection in decision-making. IEEE Trans Neural Syst Rehabil Eng; 2022; 30, pp. 1858-1869. [DOI: https://dx.doi.org/10.1109/TNSRE.2022.3188560]
21. Chen LC, Sheu JT, Chuang YJ, Tsao Y. Predicting the travel distance of patients to access healthcare using deep neural networks. IEEE J Transl Eng Health Med. 2022;10:1–11, 2022, Art no. 4900411, https://doi.org/10.1109/JTEHM.2021.3134106.
22. Etu, E-E et al. Prediction of length of stay in the emergency department for COVID-19 patients: a machine learning approach. IEEE Access; 2022; 10, pp. 42243-42251. [DOI: https://dx.doi.org/10.1109/ACCESS.2022.3168045]
23. Kumar, M; Kumar Singh, S; Kim, S. Predictive analytics for mortality: FSRNCA-FLANN modeling using public health inventory records. IEEE Access; 2024; 12, pp. 81252-81264. [DOI: https://dx.doi.org/10.1109/ACCESS.2024.3411162]
24. Margret, IN; Rajakumar, K; Arulalan, KV; Manikandan, S. Statistical insights into machine learning-based box models for pregnancy care and maternal mortality reduction: a literature survey. IEEE Access; 2024; 12, pp. 68184-68207. [DOI: https://dx.doi.org/10.1109/ACCESS.2024.3399827]
25. Venturini, M; Haredasht, FN; Sabovčik, F; Miller, RJH; Kuznetsova, T; Vens, C. Improving 1-year mortality prediction after pediatric heart transplantation using hypothetical donor-recipient matches. IEEE Access; 2024; 12, pp. 89754-89762. [DOI: https://dx.doi.org/10.1109/ACCESS.2024.3418146]
26. Bengesi, S; Oladunni, T; Olusegun, R; Audu, H. A machine learning-sentiment analysis on monkeypox outbreak: an extensive dataset to show the polarity of public opinion from twitter tweets. IEEE Access; 2023; 11, pp. 11811-11826. [DOI: https://dx.doi.org/10.1109/ACCESS.2023.3242290]
27. Ullah, U; Jurado, AGO; Gonzalez, ID; Garcia-Zapirain, B. A fully connected quantum convolutional neural network for classifying ischemic cardiopathy. IEEE Access; 2022; 10, pp. 134592-134605. [DOI: https://dx.doi.org/10.1109/ACCESS.2022.3232307]
28. Shokrekhodaei, M; Cistola, DP; Roberts, RC; Quinones, S. Non-invasive glucose monitoring using optical sensor and machine learning techniques for diabetes applications. IEEE Access; 2021; 9, pp. 73029-73045. [DOI: https://dx.doi.org/10.1109/ACCESS.2021.3079182]
29. Lu, HY et al. Digital health and machine learning technologies for blood glucose monitoring and management of gestational diabetes. IEEE Rev Biomed Eng; 2024; 17, pp. 98-117. [DOI: https://dx.doi.org/10.1109/RBME.2023.3242261]
30. Vlajnic, MM; Simske, SJ. Accuracy and performance of machine learning methodologies: novel assessments of country pandemic vulnerability based on non-pandemic predictors. IEEE Access; 2023; 11, pp. 90575-90594. [DOI: https://dx.doi.org/10.1109/ACCESS.2023.3307495]
31. Abbas, SA; Riaz, R; Kazmi, SZH; Rizvi, SS; Kwon, SJ. Cause analysis of caesarian sections and application of machine learning methods for classification of birth data. IEEE Access; 2018; 6, pp. 67555-67561. [DOI: https://dx.doi.org/10.1109/ACCESS.2018.2879115]
32. Sengupta, N; Rao, AS; Yan, B; Palaniswami, M. A survey of wearable sensors and machine learning algorithms for automated stroke rehabilitation. IEEE Access; 2024; 12, pp. 36026-36054. [DOI: https://dx.doi.org/10.1109/ACCESS.2024.3373910]
33. Elyan, E; Hussain, A; Sheikh, A; Elmanama, AA; Vuttipittayamongkol, P; Hijazi, K. Antimicrobial resistance and machine learning: challenges and opportunities. IEEE Access; 2022; 10, pp. 31561-31577. [DOI: https://dx.doi.org/10.1109/ACCESS.2022.3160213]
34. Gonzalez, AM; Azuaje, FJ; Ramirez, JL; da Silveira, JF; Dorronsoro, JR. Machine learning techniques for the automated classification of adhesin-like proteins in the human protozoan parasite trypanosoma cruzi. IEEE/ACM Trans Comput Biol Bioinform; 2009; 6,
35. Abdar, M; Acharya, UR; Sarrafzadegan, N; Makarenkov, V. NE-nu-SVC: a new nested ensemble clinical decision support system for effective diagnosis of coronary artery disease. IEEE Access; 2019; 7, pp. 167605-167620. [DOI: https://dx.doi.org/10.1109/ACCESS.2019.2953920]
36. Obaido, G et al. An improved framework for detecting thyroid disease using filter-based feature selection and stacking ensemble. IEEE Access; 2024; 12, pp. 89098-89112. [DOI: https://dx.doi.org/10.1109/ACCESS.2024.3418974]
37. Sobrinho A, Queiroz ACMDS, Dias Da Silva L, De Barros Costa E, Eliete Pinheiro M, Perkusich A. Computer-aided diagnosis of chronic kidney disease in developing countries: a comparative analysis of machine learning techniques. IEEE Access. 2020;8:25407–25419. https://doi.org/10.1109/ACCESS.2020.2971208.
38. Rashed-Al-Mahfuz M, Haque A, Azad A, Alyami SA, Quinn JMW, Moni MA. Clinically applicable machine learning approaches to identify attributes of chronic kidney disease (CKD) for use in low-cost diagnostic screening. IEEE J Transl Eng Health Med. 2021;9:1–11. Art no. 4900511. https://doi.org/10.1109/JTEHM.2021.3073629.
39. Chapman, D; Strong, C; Tiver, KD; Dharmaprani, D; Jenkins, E; Ganesan, AN. Infra-red imaging to detect respirator leak in healthcare workers during fit-testing clinic. IEEE Open J Eng Med Biol; 2024; 5, pp. 198-204. [DOI: https://dx.doi.org/10.1109/OJEMB.2023.3330292]
40. Shang, M et al. Otago exercises monitoring for older adults by a single IMU and hierarchical machine learning models. IEEE Trans Neural Syst Rehabil Eng; 2024; 32, pp. 462-471. [DOI: https://dx.doi.org/10.1109/TNSRE.2024.3355299]
41. Braig, N; Benz, A; Voth, S; Breitenbach, J; Buettner, R. Machine learning techniques for sentiment analysis of COVID-19-related twitter data. IEEE Access; 2023; 11, pp. 14778-14803. [DOI: https://dx.doi.org/10.1109/ACCESS.2023.3242234]
42. Rahman, MM; Paul, KC; Hossain, MA; Ali, GGMN; Rahman, MS; Thill, J-C. Machine learning on the COVID-19 pandemic, human mobility and air quality: a review. IEEE Access; 2021; 9, pp. 72420-72450. [DOI: https://dx.doi.org/10.1109/ACCESS.2021.3079121]
43. Bleichrodt, A; Dahal, S; Maloney, K et al. Real-time forecasting the trajectory of monkeypox outbreaks at the national and global levels, July–October 2022. BMC Med; 2023; 21, 19. [DOI: https://dx.doi.org/10.1186/s12916-022-02725-2]
44. Santillana, M; Nguyen, AT; Dredze, M; Paul, MJ; Nsoesie, EO; Brownstein, JS. Combining search, social media, and traditional data sources to improve influenza surveillance. PLoS Comput Biol; 2015; 11,
45. Lampos, V; Miller, AC; Crossan, S; Stefansen, C. Advances in nowcasting influenza-like illness rates using search query logs. Sci Rep; 2015; 5, 12760. [DOI: https://dx.doi.org/10.1038/srep12760]
46. Petropoulos, F; Makridakis, S. Forecasting the novel coronavirus COVID-19” s. PLoS ONE; 2020; 15,
47. Chinazzi, M; Davis, JT; Ajelli, M; Gioannini, C; Litvinova, M; Merler, S; Vespignani, A. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science; 2020; 368,
48. Guo, P; Liu, T; Zhang, Q; Wang, L; Xiao, J; Zhang, Q; Ma, W. Developing a dengue forecast model using machine learning: a case study in China. PLoS Negl Trop Dis; 2017; 11,
49. SarderF, Akter S, Akter S. Predicting Dengue Outbreak from Climate Data Using Machine Learning Algorithms. In: 2022 IEEE International Conference on Data Science and Information System (ICDSIS), Hassan, India, 2022. p. 1–6, https://doi.org/10.1109/ICDSIS55133.2022.9915862.
50. Alfred R, Obit JH. The roles of machine learning methods in limiting the spread of deadly diseases: a systematic review. Heliyon. 2021;7(6):e07371. https://doi.org/10.1016/j.heliyon.2021. e07371. Epub 2021 Jun 23. PMID: 34179541; PMCID: PMC8219638.
51. Guleria, P; Ahmed, S; Alhumam, A; Srinivasu, PN. Empirical study on classifiers for earlier prediction of COVID-19 infection cure and death rate in the Indian States. Healthcare (Basel); 2022; 10,
52. Siettos, CI; Russo, L. Mathematical modeling of infectious disease dynamics. Virulence.; 2013; 4,
53. Abbo, LM; Vasiliu-Feltes, I. Disrupting the infectious disease ecosystem in the digital precision health era innovations and converging emerging technologies. Antimicrob Agents Chemother; 2023; 67,
54. Hussain-Alkhateeb, L; Rivera Ramírez, T; Kroeger, A; Gozzer, E; Runge-Ranzinger, S. Early warning systems (EWSs) for chikungunya, dengue, malaria, yellow fever, and Zika outbreaks: What is the evidence? A scoping review. PLoS Negl Trop Dis; 2021; 15,
55. Rangarajan, P; Mody, SK; Marathe, M. Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data. PLoS Comput Biol; 2019; 15,
56. Buultjens, AH; Vandelannoote, K; Mercoulia, K; Ballard, S; Sloggett, C; Howden, BP; Seemann, T; Stinear, TP. High performance Legionella pneumophila source attribution using genomics-based machine learning classification. Appl Environ Microbiol; 2024; 90,
57. Kagashe, I; Yan, Z; Suheryani, I. Enhancing seasonal influenza surveillance: topic analysis of widely used medicinal drugs using twitter data. J Med Internet Res; 2017; 19,
58. Yom-Tov, E; Borsa, D; Cox, IJ; McKendry, RA. Detecting disease outbreaks in mass gatherings using Internet data. J Med Internet Res; 2014; 16,
59. Gupta, A; Katarya, R. Social media based surveillance systems for healthcare using machine learning: a systematic review. J Biomed Inform; 2020; 108, [DOI: https://dx.doi.org/10.1016/j.jbi.2020.103500] 103500.Epub 2020 Jul 2. PMID: 32622833; PMCID: PMC7331523
60. Rakhshan, SA; Nejad, MS; Zaj, M; Ghane, FH. Global analysis and prediction scenario of infectious outbreaks by recurrent dynamic model and machine learning models: a case study on COVID-19. Comput Biol Med; 2023; 158, [DOI: https://dx.doi.org/10.1016/j.compbiomed.2023.106817] 106817.Epub 2023 Mar 23. PMID: 36989749; PMCID: PMC10035804
61. Kim, M; Chae, K; Lee, S; Jang, HJ; Kim, S. Automated classification of online sources for infectious disease occurrences using machine-learning-based natural language processing approaches. Int J Environ Res Public Health; 2020; 17,
62. Zhang L, Xiong S, Zhu S, Tian J, Chen Q, Luo X, Guo H. Construction of Prediction Model of Foodborne Disease Outbreaks and Its Trend Prediction-Guizhou Province, China, 2023–2025. China CDC Wkly. 2024;6(18):408–412. https://doi.org/10.46234/ccdcw2024.079. PMID: 38737480; PMCID: PMC11082649.
63. Han, HJ; Suh, HS. Predicting unmet healthcare needs in post-disaster: a machine learning approach. Int J Environ Res Public Health; 2023; 20,
64. Giorgini, F; Di Dalmazi, G; Diciotti, S. Artificial intelligence in endocrinology: a comprehensive review. J Endocrinol Invest; 2024; 47,
65. Sebastiani, M; Vacchi, C; Manfredi, A; Cassone, G. Personalized medicine and machine learning: a roadmap for the future. J Clin Med; 2022; 11,
66. Kourou, K; Exarchos, TP; Exarchos, KP; Karamouzis, MV; Fotiadis, DI. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J; 2015; 2014,
67. Mavaddat, N; Michailidou, K; Dennis, J et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am J Hum Genet; 2019; 104,
68. Obermeyer, Z; Emanuel, EJ. Predicting the future—big data, machine learning, and clinical medicine. N Engl J Med; 2016; 375,
69. Esteva, A; Kuprel, B; Novoa, R et al. Correction: corrigendum: dermatologist-level classification of skin cancer with deep neural networks. Nature; 2017; 546, 686. [DOI: https://dx.doi.org/10.1038/nature22985]
70. Esteva, A; Kuprel, B; Novoa, R et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature; 2017; 542, pp. 115-118. [DOI: https://dx.doi.org/10.1038/nature21056]
71. Saeedbakhsh, S; Sattari, M; Mohammadi, M; Najafian, J; Mohammadi, F. Diagnosis of coronary artery disease based on machine learning algorithms support vector machine artificial neural network, and random forest. Adv Biomed Res; 2023; 12,
72. Wang J, Chen Y. Federated Learning for Personalized Healthcare. In: Introduction to Transfer Learning. Machine Learning: Foundations, Methodologies, and Applications. Springer, Singapore. 2023. https://doi.org/10.1007/978-981-19-7584-4_19
73. Hassan, M; Awan, FM; Naz, A; deAndrés-Galiana, EJ; Alvarez, O; Cernea, A; Fernández-Brillet, L; Fernández-Martínez, JL; Kloczkowski, A. Innovations in genomics and big data analytics for personalized medicine and health care: a review. Int J Mol Sci; 2022; 23,
74. Iwasaki, Y; Ikemura, T; Wada, K et al. Comparative genomic analysis of the human genome and six bat genomes using unsupervised machine learning: Mb-level CpG and TFBS islands. BMC Genomics; 2022; 23, 497. [DOI: https://dx.doi.org/10.1186/s12864-022-08664-9]
75. Jankovic, B; Gojobori, T. From shallow to deep: some lessons learned from application of machine learning for recognition of functional genomic elements in human genome. Hum Genomics; 2022; 16, 7. [DOI: https://dx.doi.org/10.1186/s40246-022-00376-1]
76. Gao, Y; Cui, Y. Optimizing clinico-genomic disease prediction across ancestries: a machine learning strategy with Pareto improvement. Genome Med; 2024; 16, 76. [DOI: https://dx.doi.org/10.1186/s13073-024-01345-0]
77. Yan, Q; Fruzangohar, M; Taylor, J et al. Improved genomic prediction using machine learning with Variational Bayesian sparsity. Plant Methods; 2023; 19, 96. [DOI: https://dx.doi.org/10.1186/s13007-023-01073-3]
78. Chung, CW; Chou, SC; Hsiao, TH et al. Machine learning approaches to identify systemic lupus erythematosus in anti-nuclear antibody-positive patients using genomic data and electronic health records. BioData Mining; 2024; 17, 1. [DOI: https://dx.doi.org/10.1186/s13040-023-00352-y]
79. Basodi, S; Baykal, PI; Zelikovsky, A et al. Analysis of heterogeneous genomic samples using image normalization and machine learning. BMC Genomics; 2020; 21,
80. Sardaar, S; Qi, B; Dionne-Laporte, A et al. Machine learning analysis of exome trios to contrast the genomic architecture of autism and schizophrenia. BMC Psychiatry; 2020; 20, 92. [DOI: https://dx.doi.org/10.1186/s12888-020-02503-5]
81. Lourenço, V; Ogutu, J; Rodrigues, R et al. Genomic prediction using machine learning: a comparison of the performance of regularized regression, ensemble, instance-based and deep learning methods on synthetic and empirical data. BMC Genomics; 2024; 25, 152. [DOI: https://dx.doi.org/10.1186/s12864-023-09933-x]
82. Donnelly, N; Cunningham, A; Salas, SM et al. Identifying the neurodevelopmental and psychiatric signatures of genomic disorders associated with intellectual disability: a machine learning approach. Molecular Autism; 2023; 14, 19. [DOI: https://dx.doi.org/10.1186/s13229-023-00549-2]
83. Alireza, Z; Maleeha, M; Kaikkonen, M et al. Enhancing prediction accuracy of coronary artery disease through machine learning-driven genomic variant selection. J Transl Med; 2024; 22, 356. [DOI: https://dx.doi.org/10.1186/s12967-024-05090-1]
84. Dal Bo, M; Polano, M; Ius, T et al. Machine learning to improve interpretability of clinical, radiological, and panel-based genomic data of glioma grade 4 patients undergoing surgical resection. J Transl Med; 2023; 21, 450. [DOI: https://dx.doi.org/10.1186/s12967-023-04308-y]
85. Zelli, V; Manno, A; Compagnoni, C et al. Classification of tumor types using XGBoost machine learning model: a vector space transformation of genomic alterations. J Transl Med; 2023; 21, 836. [DOI: https://dx.doi.org/10.1186/s12967-023-04720-4]
86. Chang, SW; Abdul-Kareem, S; Merican, AF et al. Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of feature selection and machine learning methods. BMC Bioinform; 2013; 14, 170. [DOI: https://dx.doi.org/10.1186/1471-2105-14-170]
87. Castelli, P; De Ruvo, A; Bucciacchio, A et al. Harmonization of supervised machine learning practices for efficient source attribution of Listeria monocytogenes based on genomic data. BMC Genomics; 2023; 24, 560. [DOI: https://dx.doi.org/10.1186/s12864-023-09667-w]
88. Yaldız, B; Erdoğan, O; Rafatov, S et al. Revealing third-order interactions through the integration of machine learning and entropy methods in genomic studies. BioData Mining; 2024; 17, 3. [DOI: https://dx.doi.org/10.1186/s13040-024-00355-3]
89. De Velasco Oriol, J; Vallejo, EE; Estrada, K et al. Benchmarking machine learning models for late-onset alzheimer’s disease prediction from genomic data. BMC Bioinform; 2019; 20, 709. [DOI: https://dx.doi.org/10.1186/s12859-019-3158-x]
90. Elsherbini, AMA; Elkholy, AH; Fadel, YM et al. Utilizing genomic signatures to gain insights into the dynamics of SARS-CoV-2 through machine and deep learning techniques. BMC Bioinform; 2024; 25, 131. [DOI: https://dx.doi.org/10.1186/s12859-024-05648-2]
91. Pirooznia, M; Yang, JY; Yang, MQ et al. A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics; 2008; 9,
92. Fröhlich, H; Balling, R; Beerenwinkel, N et al. From hype to reality: data science enabling personalized medicine. BMC Med; 2018; 16, 150. [DOI: https://dx.doi.org/10.1186/s12916-018-1122-7]
93. Bhatti, MHR; Javaid, N; Mansoor, B; Alrajeh, N; Aslam, M; Asad, M. New hybrid deep learning models to predict cost from healthcare providers in smart hospitals. IEEE Access; 2023; 11, pp. 136988-137010. [DOI: https://dx.doi.org/10.1109/ACCESS.2023.3336424]
94. Sanabria-Russo, L; Serra, J; Pubill, D; Verikoukis, C. CURATE: on-demand orchestration of services for health emergencies prediction and mitigation. IEEE J Sel Areas Commun; 2021; 39,
95. Fiest, KM; Krewulak, KD; Plotnikoff, KM et al. Allocation of intensive care resources during an infectious disease outbreak: a rapid review to inform practice. BMC Med; 2020; 18, 404. [DOI: https://dx.doi.org/10.1186/s12916-020-01871-9]
96. Lu, X; Qiu, H. Explainable prediction of daily hospitalizations for cerebrovascular disease using stacked ensemble learning. BMC Med Inform Decis Mak; 2023; 23, 59. [DOI: https://dx.doi.org/10.1186/s12911-023-02159-7]
97. Torri, V; Ercolanoni, M; Bortolan, F et al. A NLP-based semi-automatic identification system for delays in follow-up examinations: an Italian case study on clinical referrals. BMC Med Inform Decis Mak; 2024; 24, 107. [DOI: https://dx.doi.org/10.1186/s12911-024-02506-2]
98. Kent, DM; Paulus, JK; Sharp, RR et al. When predictions are used to allocate scarce health care resources: three considerations for models in the era of COVID-19. Diagn Progn Res; 2020; 4, 11. [DOI: https://dx.doi.org/10.1186/s41512-020-00079-y]
99. Mremi, IR; George, J; Rumisha, SF et al. Twenty years of integrated disease surveillance and response in Sub-Saharan Africa: challenges and opportunities for effective management of infectious disease epidemics. One Health Outlook; 2021; 3, 22. [DOI: https://dx.doi.org/10.1186/s42522-021-00052-9]
100. Tuominen, J; Lomio, F; Oksala, N et al. Forecasting daily emergency department arrivals using high-dimensional multivariate data: a feature selection approach. BMC Med Inform Decis Mak; 2022; 22, 134. [DOI: https://dx.doi.org/10.1186/s12911-022-01878-7]
101. Jackson, R; Kartoglu, I; Stringer, C et al. CogStack-experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital. BMC Med Inform Decis Mak; 2018; 18, 47. [DOI: https://dx.doi.org/10.1186/s12911-018-0623-9]
102. Luo, L; Luo, L; Zhang, X et al. Hospital daily outpatient visits forecasting using a combinatorial model based on ARIMA and SES models. BMC Health Serv Res; 2017; 17, 469. [DOI: https://dx.doi.org/10.1186/s12913-017-2407-9]
103. Kim, M; Holton, M; Sweeting, A et al. Using health administrative data to model associations and predict hospital admissions and length of stay for people with eating disorders. BMC Psychiatry; 2023; 23, 326. [DOI: https://dx.doi.org/10.1186/s12888-023-04688-x]
104. Cheng, CH; Kuo, YH; Zhou, Z. Outbreak minimization v.s. influence maximization: an optimization framework. BMC Med Inform Decis Mak; 2020; 20, 266. [DOI: https://dx.doi.org/10.1186/s12911-020-01281-0]
105. Huang, Y; Xu, C; Ji, M et al. Medical service demand forecasting using a hybrid model based on ARIMA and self-adaptive filtering method. BMC Med Inform Decis Mak; 2020; 20, 237. [DOI: https://dx.doi.org/10.1186/s12911-020-01256-1]
106. Li, W; Zhang, Y; Zhou, X et al. Ensemble learning-assisted prediction of prolonged hospital length of stay after spine correction surgery: a multi-center cohort study. J Orthop Surg Res; 2024; 19, 112. [DOI: https://dx.doi.org/10.1186/s13018-024-04576-4]
107. Reboredo, JC; Barba-Queiruga, JR; Ojea-Ferreiro, J et al. Forecasting emergency department arrivals using INGARCH models. Health Econ Rev; 2023; 13, 51. [DOI: https://dx.doi.org/10.1186/s13561-023-00456-5]
108. Medeiros, NB; Fogliatto, FS; Rocha, MK et al. Forecasting the length-of-stay of pediatric patients in hospitals: a scoping review. BMC Health Serv Res; 2021; 21, 938. [DOI: https://dx.doi.org/10.1186/s12913-021-06912-4]
109. Rothenberg, WA; Bizzego, A; Esposito, G; Lansford, JE; Al-Hassan, SM; Bacchini, D; Bornstein, MH; Chang, L; Deater-Deckard, K; Di Giunta, L; Dodge, KA; Gurdal, S; Liu, Q; Long, Q; Oburu, P; Pastorelli, C; Skinner, AT; Sorbring, E; Tapanya, S; Steinberg, L; Tirado, LMU; Yotanyamaneewong, S; Alampay, LP. Predicting adolescent mental health outcomes across cultures: a machine learning approach. J Youth Adolesc; 2023; 52,
110. Sweeney, C; Ennis, E; Mulvenna, MD; Bond, R; O'Neill, S. Insights derived from text-based digital media, in relation to mental health and suicide prevention, using data analysis and machine learning: systematic review. JMIR Ment Health; 2024; 11, [DOI: https://dx.doi.org/10.2196/55747] e55747.PMID:38935419;PMCID:PMC11240075
111. Garriga, R; Mas, J; Abraha, S; Nolan, J; Harrison, O; Tadros, G; Matic, A. Machine learning model to predict mental health crises from electronic health records. Nat Med; 2022; 28,
112. Van Mens, K; Lokkerbol, J; Wijnen, B; Janssen, R; de Lange, R; Tiemens, B. Predicting undesired treatment outcomes with machine learning in mental health care: multisite study. JMIR Med Inform; 2023; 11, [DOI: https://dx.doi.org/10.2196/44322] e44322.PMID:37623374;PMCID:PMC10466445
113. Cao, M; Martin, E; Li, X. Machine learning in attention-deficit/hyperactivity disorder: new approaches toward understanding the neural mechanisms. Transl Psychiatry; 2023; 13,
114. Rahman, MA; Kohli, T. Mental health analysis of international students using machine learning techniques. PLoS ONE; 2024; 19,
115. Chen, Z; Hu, B; Liu, X; Becker, B; Eickhoff, SB; Miao, K; Gu, X; Tang, Y; Dai, X; Li, C; Leonov, A; Xiao, Z; Feng, Z; Chen, J; Chuan-Peng, H. Sampling inequalities affect generalization of neuroimaging-based diagnostic classifiers in psychiatry. BMC Med; 2023; 21,
116. Mukerji, SS; Petersen, KJ; Pohl, KM; Dastgheyb, RM; Fox, HS; Bilder, RM; Brouillette, MJ; Gross, AL; Scott-Sheldon, LAJ; Paul, RH; Gabuzda, D. Machine learning approaches to understand cognitive phenotypes in people with HIV. J Infect Dis; 2023; 227,
117. Le Glaz, A; Haralambous, Y; Kim-Dufor, DH; Lenca, P; Billot, R; Ryan, TC; Marsh, J; DeVylder, J; Walter, M; Berrouiguet, S; Lemey, C. Machine learning and natural language processing in mental health: systematic review. J Med Internet Res; 2021; 23,
118. Mukherjee, S; Frimpong Boamah, E; Ganguly, P; Botchwey, N. A multilevel scenario based predictive analytics framework to model the community mental health and built environment nexus. Sci Rep; 2021; 11,
119. Shiba, K; Daoud, A; Kino, S; Nishi, D; Kondo, K; Kawachi, I. Uncovering heterogeneous associations of disaster-related traumatic experiences with subsequent mental health problems: a machine learning approach. Psychiatry Clin Neurosci; 2022; 76,
120. Di Cara, NH; Maggio, V; Davis, OSP; Haworth, CMA. Methodologies for monitoring mental health on twitter: systematic review. J Med Internet Res; 2023; 25, [DOI: https://dx.doi.org/10.2196/42734] e42734.PMID:37155236;PMCID:PMC10203928
121. Robinson, T; Condell, J; Ramsey, E; Leavey, G. Self-management of subclinical common mental health disorders (Anxiety, Depression and Sleep Disorders) using wearable devices. Int J Environ Res Public Health; 2023; 20,
122. Yao, L; Wang, Z; Gu, H; Zhao, X; Chen, Y; Liu, L. Prediction of Chinese clients’ satisfaction with psychotherapy by machine learning. Front Psychiatry; 2023; 14, [DOI: https://dx.doi.org/10.3389/fpsyt.2023.947081] 947081.PMID:36741124;PMCID:PMC9893506
123. Khan, AE; Hasan, MJ; Anjum, H; Mohammed, N; Momen, S. Predicting life satisfaction using machine learning and explainable AI. Heliyon; 2024; 10,
124. Sundaram, A; Subramaniam, H; Ab Hamid, SH; Mohamad, NA. An adaptive data-driven architecture for mental health care applications. PeerJ; 2024; 29,
125. Chen, RJ; Wang, JJ; Williamson, DFK; Chen, TY; Lipkova, J; Lu, MY; Sahai, S; Mahmood, F. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat Biomed Eng.; 2023; 7,
126. Lai, CH; Mok, PK; Chau, WW; Law, SW. Application of machine learning models on predicting the length of hospital stay in fragility fracture patients. BMC Med Inform Decis Mak; 2024; 24,
127. Robert SA, Liu AY. Changes in public awareness of the social determinants of health over 15 years in Wisconsin, United States, Preventive Medicine Reports. Prev Med Reports 2025;50:102965, ISSN 2211–3355, https://doi.org/10.1016/j.pmedr.2025.102965.
128. Herd D. Policing as a social determinant of health in three decades of public health research: a systematic review. SSM Popul Health, https://doi.org/10.1016/j.ssmph.2025.101801.
129. Lim, S; Bekemeier, B; Pintye, J; Grembowski, D. The association between social determinants of health and case rates of sexually transmitted infections at the county-level in the US from 2000–2019. Am J Prev Med; 2025; [DOI: https://dx.doi.org/10.1016/j.amepre.2025.04.010]
130. Oncel D, Ravi R, Arolli X, Hoyek S, Chaaya C, Berrocal AM, Patel NA. A comparison of pediatric and adult ocular diseases in the context of social determinants of health. AJO Inter. 2025:100128, ISSN 2950–2535, https://doi.org/10.1016/j.ajoint.2025.100128.
131. Binns, R; Veale, M. Is that your final decision? Multi-stage profiling, selective effects, and Article 22 of the GDPR. Int Data Privacy Law; 2021; 11,
132. Johnson, A; Pollard, T; Shen, L et al. MIMIC-III, a freely accessible critical care database. Sci Data; 2016; 3, [DOI: https://dx.doi.org/10.1038/sdata.2016.35] 160035.
133. Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A. A Survey on Bias and Fairness in Machine Learning. ACM Comput. Surv. 2021;54(6), Article 115 (July 2022), 35. https://doi.org/10.1145/3457607
134. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell; 2019; 1, pp. 206-215. [DOI: https://dx.doi.org/10.1038/s42256-019-0048-x]
135. Emanuel, EJ; Wachter, RM. Artificial intelligence in health care: Will the Value Match the Hype?. JAMA; 2019; 321,
136. Esteva, A; Robicquet, A; Ramsundar, B et al. A guide to deep learning in healthcare. Nat Med; 2019; 25, pp. 24-29. [DOI: https://dx.doi.org/10.1038/s41591-018-0316-z]
137. Vayena, E; Salathé, M; Madoff, LC; Brownstein, JS. Ethical challenges of big data in public health. PLoS Comput Biol; 2015; 11,
138. Topol, EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med; 2019; 25, pp. 44-56. [DOI: https://dx.doi.org/10.1038/s41591-018-0300-7]
139. Jiang F, Jiang Y, Zhi H, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vascular Neurol 2017;2(4), 230–243: e000101. https://doi.org/10.1136/svn-2017-000101.
140. Khanna, NN; Maindarkar, MA; Viswanathan, V; Fernandes, JFE; Paul, S; Bhagawati, M; Ahluwalia, P; Ruzsa, Z; Sharma, A; Kolluri, R et al. Economics of artificial intelligence in healthcare: diagnosis vs. treatment. Healthcare; 2022; 10, 2493. [DOI: https://dx.doi.org/10.3390/healthcare10122493W]
141. Jiao, W; Zhang, X; D’Souza, F. The economic value and clinical impact of artificial intelligence in healthcare: a scoping literature review. IEEE Access; 2023; 11, pp. 123445-123457. [DOI: https://dx.doi.org/10.1109/ACCESS.2023.3327905]
142. Chen, IY et al. Ethical machine learning in health care: a critical review. Ann Rev Biomed Data Sci; 2021; 4,
143. Chen, IY; Pierson, E; Rose, S; Joshi, S; Ferryman, K; Ghassemi, M. Ethical machine learning in healthcare. Ann Rev Biomed Data Sci; 2021; 4, pp. 123-144. [DOI: https://dx.doi.org/10.1146/annurev-biodatasci-092820-114757]
144. Wiens, J; Shenoy, ES. Machine learning for healthcare: on the verge of a major shift in healthcare epidemiology. Clin Infect Dis; 2018; 66,
145. Reddy, S et al. Evaluation of artificial intelligence in healthcare: analysis of the metrics and benchmarking process. BMJ Health Care Inform; 2021; 28,
146. Martin, G et al. AI in healthcare: industry initiatives and a place for regulation?. Health Policy Technol; 2020; 9,
147. Lange, E; Kranert, L; Krüger, J; Benndorf, D; Heyer, R. Microbiome modeling: a beginner’s guide. Front Microbiol; 2024; 15, 1368377. [DOI: https://dx.doi.org/10.3389/fmicb.2024.1368377]
148. Liu, D; Clemente, L; Poirier, C; Ding, X; Chinazzi, M; Davis, J; Vespignani, A; Santillana, M. Real-time forecasting of the COVID-19 outbreak in chinese provinces: machine learning approach using novel digital data and estimates from mechanistic models. J Med Internet Res; 2020; 22,
149. Tonekaboni S et al. What clinicians want: contextualizing explainable machine learning for clinical end use. Proc Mach Learn Res 2019;106:1–21 https://doi.org/10.48550/arXiv.1905.05134.
150. Li, T; Sahu, AK; Talwalkar, A; Smith, V. Federated learning: challenges, methods, and future directions. IEEE Signal Process Mag; 2020; 37,
151. Robinson, PN. Deep phenotyping for precision medicine. Hum Mutat; 2012; 33,
152. Schalkamp, AK; Rahman, N; Monzón-Sandoval, J; Sandor, C. Deep phenotyping for precision medicine in Parkinson's disease. Dis Model Mech; 2022; 15,
153. Lees JA, Russell TW, Shaw LP, Hellewell J. Recent approaches in computational modelling for controlling pathogen threats. Life Sci Alliance. 2024;7(9):e202402666. https://doi.org/10.26508/lsa.202402666. PMID: 38906676; PMCID: PMC11192964.
154. Menzies NA, Wolf E, Connors D, Bellerose M, Sbarra AN, Cohen T, Hill AN, Yaesoubi R, Galer K, White PJ, Abubakar I, Salomon JA. Progression from latent infection to active disease in dynamic tuberculosis transmission models: a systematic review of the validity of modelling assumptions. Lancet Infect Dis. 2018;18(8):e228-e238. https://doi.org/10.1016/S1473-3099(18)30134-8. Epub 2018 Apr 10. Erratum in: Lancet Infect Dis. 2018 Nov;18(11):1177. https://doi.org/10.1016/S1473-3099(18)30603-0. PMID: 29653698; PMCID: PMC6070419.
155. Zhou S, Zhao J, and Zhang L (2022) Application of Artificial Intelligence on Psychological Interventions and Diagnosis: An Overview. Front. Psychiatry 13:811665. https://doi.org/10.3389/fpsyt.2022.811665Bibault, J. E., et al. (2019). AI and big data in cancer: revolutionizing patient care. Nature Reviews Clinical Oncology, 16(11), 663–674.
156. Dlamini, Z; Francies, FZ; Hull, R; Marima, R. Artificial intelligence (AI) and big data in cancer and precision oncology. Comput Struct Biotechnol J; 2020; 18, pp. 2300-2311. [DOI: https://dx.doi.org/10.1016/j.csbj.2020.08.019] PMID:32994889;PMCID:PMC7490765
157. Andrew, J; Rudra, M; Eunice, J; Belfin, RV. Artificial intelligence in adolescents’ mental health disorder diagnosis, prognosis, and treatment. Front Public Health; 2023; 11, 1110088. [DOI: https://dx.doi.org/10.3389/fpubh.2023.1110088] PMID:37064712;PMCID:PMC10102508
158. Gerke, S; Minssen, T; Cohen, G. Ethical and legal challenges of artificial intelligence-driven healthcare. Artif Intell Healthcare.; 2020; [DOI: https://dx.doi.org/10.1016/B978-0-12-818438-7.00012-5] Epub 2020 Jun 26. PMCID: PMC7332220
159. Farlow, A; Hoffmann, A; Tadesse, GA; Mzurikwao, D; Beyer, R; Akogo, D; Weicken, E; Matika, T; Nweje, MI; Wamae, W; Arts, S; Wiegand, T; Bennett, C; Farhat, MR; Gröschel, MI. Rethinking global digital health and AI-for-health innovation challenges. PLOS Glob Public Health; 2023; 3,
160. Mishra, T; Wang, M; Metwally, AA; Bogu, GK; Brooks, AW; Bahmani, A; Alavi, A; Celli, A; Higgs, E; Dagan-Rosenfeld, O; Fay, B; Kirkpatrick, S; Kellogg, R; Gibson, M; Wang, T; Hunting, EM; Mamic, P; Ganz, AB; Rolnik, B; Li, X; Snyder, MP. Pre-symptomatic detection of COVID-19 from smartwatch data. Nat Biomed Eng.; 2020; 4,
161. Gadaleta, M; Radin, JM; Baca-Motes, K; Ramos, E; Kheterpal, V; Topol, EJ; Steinhubl, SR; Quer, G. Passive detection of COVID-19 with wearable sensors and explainable machine learning algorithms. NPJ Digit Med; 2021; 4,
162. Miotto, R et al. Deep learning for healthcare: review, opportunities, and challenges. Brief Bioinform; 2018; 19,
163. Wang, CJ; Ng, CY; Brook, RH. Response to COVID-19 in Taiwan: big data analytics, new technology, and proactive testing. JAMA; 2020; 323,
164. Kumar R, Singh A, Kassar AS, Humaida MI, Joshi S, Sharma M. Leveraging artificial intelligence to achieve sustainable public healthcare services in Saudi Arabia: a systematic literature review of critical success factors. CMES Comput Mod Eng Sci 2025;142(2): 1289–1349, ISSN 1526–1492, https://doi.org/10.32604/cmes.2025.059152.
165. Kalhori SR, Najafi F, Hasannejadasl H, Heydari S. Artificial intelligence-enabled obesity prediction: A systematic review of cohort data analysis. Int J Med Inform. 2025;196: 105804, ISSN 1386–5056, https://doi.org/10.1016/j.ijmedinf.2025.105804.
166. Hanna, MG; Pantanowitz, L; Dash, R; Harrison, JH; Deebajah, M; Pantanowitz, J; Rashidi, HH. Future of artificial intelligence-machine learning trends in pathology and medicine. Modern Pathol; 2025; [DOI: https://dx.doi.org/10.1016/j.modpat.2025.1007052025/04/30]
167. Fiske A, Blacker S, Geneviève LD, Willem T, Fritzsche MC, Buyx A, Celi LA, McLennan S. Weighing the benefits and risks of collecting race and ethnicity data in clinical settings for medical artificial intelligence. Lancet Digital Health, 2025;7(4):e286- e294. ISSN- 2589–7500, https://doi.org/10.1016/j.landig.2025.01.003
168. Holt, DB; El-Bokl, A; Stromberg, D; Taylor, MD. Role of artificial intelligence in congenital heart disease and interventions. J Soc Cardiovasc Angiogr Interv; 2025; [DOI: https://dx.doi.org/10.1016/j.jscai.2025.102567]
169. Alshami, A; Nashwanb, A; AlDardour, A; Qusini, A. Artificial Intelligence in rehabilitation: a narrative review on advancing patient care. Rehabilitación; 2025; 59,
170. Hannaa, MG; Pantanowitza, L; Jacksonc, B; Palmera, O; Visweswaran, S; Pantanowitz, J; Deebajah, M; Rashidi, HH. Ethical and bias considerations in artificial intelligence/machine learning. Mod Pathol; 2025; 38,
171. Almaiah MA, Bin Sulaiman R, Islam U, Badr Y, El-Qirem FA. Federated learning in healthcare: a bibliometric analysis of privacy, security, and adversarial threats (2021–2024). SHIFRA. 2025:46–61. https://doi.org/10.70470/SHIFRA/2025/002
172. Raza, S; Shaban-Nejad, A; Dolatabadi, E; Mamiya, H. Exploring bias and prediction metrics to characterise the fairness of machine learning for equity-centered public health decision-making: a narrative review. IEEE Access; 2024; 12, pp. 180815-180829. [DOI: https://dx.doi.org/10.1109/ACCESS.2024.3509353]
173. Mensah GB, Mijwil MM, Abotaleb M, Ali G, Dutta PK, Mzili T, Eid MM. Explainable AI for healthcare: training healthcare workers to use artificial intelligence techniques to reduce medical negligence in ghana’s public health act, 2012 (Act 851). EDRAAK. 2025;1–6. https://doi.org/10.70470/EDRAAK/2025/001
174. Libin PJ, Moonens A, Verstraeten T, Perez-Sanjines F, Hens N, Lemey P, Nowé A. Deep reinforcement learning for large-scale epidemic control. In: Machine learning and knowledge discovery in databases. Applied data science and demo track: European conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Part V 2021 (pp. 155–70). Springer International Publishing.
175. Fascia M. Machine learning applications in medical prognostics: a comprehensive review. arXiv preprint arXiv:2408.02344.
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.