Prediction of stunting and its socioeconomic

Full text

Turn on search term navigation

Introduction

The World Health Organization (WHO) divides adolescence into two phases: late adolescence, which lasts from 15 to 19 years old, and early adolescence, which lasts from 10 to 14 years old [1]. The number of teenagers worldwide is growing; an estimated 1.2 billion individuals, or 16% of the world’s population, are anticipated to be adolescents, and approximately 90% of them reside in low- and middle-income countries [2, 3]. Adolescents make up 20–26% of the population in Ethiopia [4]. Sustaining appropriate physical, cognitive, psycho-social, and emotional development, and linear growth needs adequate nutrition since 15–20% of an adult’s height is attained during the adolescence phase [2, 5, 6].

Stunting is a vital indicator of chronic undernutrition that reveals a failure to reach linear growth as a result of continuous food restriction and early-life diseases that prevent a person from growing linearly [7]. Due to higher nutritional needs for growth and development including maturation, traditional marriage and early pregnancy, menarche, and sexual development adolescent females are highly susceptible for undernourishment and stunting [7–9]. In maturity, stunting can lead to decreased productivity, impaired social skills, behavioral issues, and metabolic disorders [10, 11].

More significantly, undernutrition has long-term effects, particularly for females, and stunted teenage girls are more likely to give birth to underweight babies who will also be stunted later in life and they run a high risk of dying during pregnancy and labor if their nutritional demands are not satisfied [8, 12]. Therefore, improving teenage nutrition is one way to end the intergenerational cycle of malnutrition and poor health; if this doesn’t happen, the vicious cycle will continue because adolescence is the last opportunity to reduce the effects of malnutrition and end the cycle of malnutrition and poor health [13]. Efforts has been done to reduce stunting among adolescent girls include high-quality nutrition education, sustainable food production, investments for the poor, improved healthcare systems, increased coverage, and strategic plans targeting adolescent girls’ nutrition [14], but stunting remains a major public health issue among adolescent girls in developing countries, including Ethiopia [15, 16].

Based on previous studies, stunting among adolescent girls was 48% in Bangladesh, 47% in Nepal [17, 18], and 34.2% in India [17]. In Ethiopia, the prevalence of stunting ranges from 15% to 41.8% [19, 20]. Different studies also reported that age groups, poor wealth index, being rural residents, having family size ≥5, working status, educational status, unimproved toilet facility, and unprotected drinking water source were identified as the predictors of stunting [13, 15, 16, 20]. Besides, Stunting can cast a long shadow over health, paving the way for chronic issues like heightened vulnerability to infections and enduring developmental delays. Furthermore, its ripple effect on cognitive abilities and educational achievements, which can limit the doors of opportunity for those affected. It also affects the wider socioeconomic repercussions that can drain productivity and inflate healthcare costs for both communities and nations [21–23].

In many societies, girls may have less access to nutritious food compared to boys, especially during adolescence. This disparity can lead to chronic undernutrition, resulting in stunting rather than immediate weight loss. Besides, chronic health issues, such as anemia or infections, can contribute to stunting more significantly than they do to underweight or wasting, as these conditions hinder growth over time [24]. Investigating growth, health, and stunting status during adolescence, in addition to infancy and childhood is very crucial. However, available published studies are limited to adolescents and they are more focused on the early childhood age [15, 20, 25]. While numerous studies have focused on young children, our research did not uncover any predictions of stunting specifically among adolescent girls. We believe that adolescent girls are especially vulnerable to the impacts of stunting [26], which can lead to long-term health and developmental issues. Implementing targeted interventions and evidence-based policies can help address the intergenerational burden of malnutrition. Besides, the available studies in Ethiopia have been used the classical statistical analysis [13, 19, 20]. Traditional analysis methods have limitations like linearity assumptions, limited feature selection, and difficulties with high-dimensional data. They rely on previous assumptions or prior knowledge, which hinders the discovery of hidden information. In contrast, machine learning (ML) excels at capturing nonlinear relationships, automated feature selection, handling high-dimensional data, and adapting to new information. ML is a powerful tool for predicting stunting and tackling complex challenges in low-resource settings [27–29].

ML algorithms offer diverse strategies for predicting, classifying, and uncovering patterns in data. The choice of algorithm depends on specific problem requirements, such as data characteristics, problem at hand, and performance criteria. ML is increasingly used in public health, including disease diagnosis, epidemic surveillance, resource allocation, drug discovery, and remote healthcare. It has the potential to revolutionize healthcare and benefit populations in need [30–32]. ML enables analysis of data to identify at-risk children, handle complex interactions, select key contributing factors, improve accuracy, and facilitate continuous learning. By applying ML, researchers and policymakers can gain insights, identify high-risk groups, and implement effective interventions, ultimately reducing stunting rates and improving child well-being in Ethiopia [4, 30].

This study aims to predict stunting among adolescent girls using eight advanced ML algorithms and association rule mining, filling the gap in previous research that focused on limited ML algorithms and specific health outcomes [30–36]. The findings of this study are relevant to governmental and non-governmental organizations aiming to improve the health and nutrition of often neglected adolescent girls in developing countries. They provide evidence for policymakers to plan integrated interventions and programs, preventing stunting and protecting the health of vulnerable subgroups of adolescents.

Methods

Design, data source, setting, and periods

We utilized the 2016 Ethiopian Demographic and Health Survey (EDHS) dataset accessed using the website www.measuredhs.com after requesting and getting approval from the DHS program database. The 2016 EDHS is the fourth survey conducted in Ethiopia to collect data on household and individual characteristics to provide updated information and/or estimates on key demographic and health indicators of the population [16, 37]. The survey took place from January 18 to June 27, 2016, with a multi-stage stratified sampling technique on 645 enumeration areas. The questionnaires were adapted by EDHS from the DHS Program’s standard DHS questionnaires. The survey included a nationally representative sample of women (aged 15–49 years) with a total sample size of 15,683 women [16]. Among this, a sub-sample of 3498 women are adolescents aged 15–19 years and they are eligible for the current study; from which a total weighted 3156 adolescent girls were retained for the final analysis. The data for predicting stunting was obtained from a woman’s questionnaire. Out of all the participants, we have analyzed 14 different features.

Population of the study

All adolescent girls aged 15–19 years in Ethiopia were the source populations for this study, whereas all adolescent girls 15–19 years in the selected enumeration areas (EAs) and whose height is recorded were the study populations.

Study variables and measurements

The outcome of interest for this study was stunting among adolescent girls which was determined by the WHO Z score of anthropometric indicator of height for age < -2 standard deviation [38] and for the analysis purpose the outcome was binary coded as 1 for stunted and 0 for not stunted.

A set of covariates was considered as the possible risk factors for adolescent stunting and extracted from DHS data set based on the previous studies [11, 15, 19, 20, 39, 40] and the WHO conceptual framework on adolescent stunting: context, causes, and consequences [41]. Based on this the predictors included in the current study include wealth index, age of respondents, educational status, region, residency, family size, number of children, occupation, media exposure, religion, current marital status, source of drinking water, type of toilet facility, and current contraceptive use. Age Group: Current age of the women and re-coded in to two categories with values of “0” for 15–17, “1” for 18–19. Religion: Recoded in four categories with a value of “0” for Muslim, “1” for Orthodox, “2” for protestant, and “3” for other religious groups (combining catholic and traditional). Occupation: Re-coded in two categories with a value of “0” for not working, and “1” for working. Media exposure: A composite variable obtained by combining whether a respondent reads newspaper/ magazine, listen to radio, and watch television with a value of “0” if women were not exposed to at least one of the three media, and “1” if a woman has access/exposure to at least one of the three media [42]. Educational status: this is the minimum educational level a woman achieved and coded into four groups with a value of “0” for no education, “1” for primary education, and “2” for secondary education, and “3” for higher education. Family size: Recoded in to three categories as 1–3, 4–6, and seven and above. The five quintiles wealth index were re-categorized as poor, medium, and rich. Items about drinking water and sanitation were based on the core questions on the source of drinking water and sanitation for household surveys developed by the WHO and DHS guide and it was re-coded into two categories as “unprotected” and “protected source of drinking water” and “improved” and “unimproved” toilet facility [16, 43, 44].

Data preprocessing and analytic strategies

Data pre-processing is a vital task before developing a prediction model and has a significant impact on the model prediction performance. Data preparation techniques includes data cleaning, feature engineering, dimensionality reduction, and data splitting [45]. The detail of the workflow for the current study is presented (see Fig 1).

[Figure omitted. See PDF.]

Data cleaning

Data cleaning is the first crucial step performed after the data were retrieved consisting of detecting and removing outliers, handling missing values, and handling unbalanced categories of the outcome variable from the data. We explored different methods of missing data management in ML, such as deletion, imputation, model-based imputation, and domain-specific knowledge. Considering factors like missingness nature, missed data amount, assumptions, and the ML algorithm used, we have opted to handle missing values in our dataset using K-nearest neighbor (KNN) imputation. KNN imputation retains all data, handles outliers, works for numerical and categorical features, adapts to new data, and reduces bias while encompassing a wide range of values. We identified outliers through scatter plots, box plots, and histograms, and assessed multicollinearity by examining the correlation matrix. A correlation value above 0.8 indicated high correlation between variables [46].

Imbalanced data handling

ML models trained on imbalanced data are typically biased toward the majority class and fail to predict cases that are rare/minority class [47]. To address this issue, researchers have developed various mechanisms. This study employed four balancing methods: under-sampling, over-sampling, ADASYN, and SMOTE [48]. After training our ML algorithms on unbalanced data, we experimented with these techniques to select best data balancing technique. We evaluated model performance using accuracy and AUC metrics and found that the SMOTE technique [49] outperformed the others, making it the chosen approach for data balancing in the final model prediction.

Feature engineering

Feature engineering is the process of transforming raw data into features that better represent the underlying problem to predictive models. Among various feature engineering techniques, the classical one-hot coding method was utilized for encoding categorical variables into numeric values, and label encoding for coding each category of variables as a number was done through the preprocessing module of the scikit-learn package.

Additionally, dimensionality reduction was conducted to decrease the number of input variables for the predictive model, aiming to create a simpler and more effective model for making predictions on new data. There are two approaches to dimension reduction: feature selection and feature extraction, with the latter being more appropriate for image processing [50]. Feature selection involves choosing the most relevant independent variables that have the greatest impact on predicting the target variable. Feature selection is the appropriate method for our dataset, while feature extraction is typically utilized for datasets involving image processing and deep learning. We have examined different feature selection techniques, including Lasso, Principal component analysis, forward selection, backward elimination, recursive feature elimination, correlation-based feature selection, and chi-square test. Their performance was evaluated using evaluation metrics [51]. Therefore, the result revealed that Boruta is the most effective method for feature selection in our dataset. Boruta is a wrapper-based technique and known for its unbiased and consistent performance, making it highly effective in selecting key variables for model prediction [52, 53]. Combining Boruta with the random forest classifier offers benefits such as improved feature selection, noise and irrelevant feature robustness, reduced bias in feature importance, reduced over fitting, resulting in better model performance, and enhanced interpretability. However, challenges and limitations associated with these techniques were existed. To address the limitations, we employ techniques such as L1 or L2 regularization, cross-validation, independent test sets, parallel processing, analyzing feature importance stability, recursive feature elimination, balancing false positives and false negatives, and conducting principal component analysis [54].

Data splitting:- to train the model and validate it on data it has never seen before a simple 80/20 split method in which 80% of samples (2525 respondents data) were used for testing and the rest 20% of respondents (631 sample) used for testing the model. Besides, a tenfold cross-validation method was used in this study for model training as it does not waste a lot of data, which is a big advantage when the number of samples is small [55].

Model selection

After the data had been divided into training and testing tests, suitable models were selected to perform the training. Since the outcome variable was categorical, the task was a classification task and appropriate classifiers needed to be selected to conduct the prediction. The dataset used in the analysis falls under the category of binary classification since stunting was categorized into two mutually exclusive categories. We have applied eight ML algorithms used for this analysis logistic regression (LR), Random forest (RF), K-neigh rest neighbor (KNN), support vector machine (SVM), Gaussian Naïve Bayes (GNB), eXtreme gradient boosting (XGBoost), decision tree (DT) and light gradient boost (LGB) classifiers. These algorithms were selected based on previous studies that applied ML techniques for classification tasks on EDHS data [56–59]. Furthermore, the choice of these algorithms was determined based on their scalability, interpretability, number of features, computational efficiency, characteristics of the data, problem type, robustness to noise and outliers, accuracy, bias-variance trade-off, and domain expertise.

In this study, we utilized the scikit-learn version 1.3.2 packages in Python, implemented within Jupyter Notebook, to employ ML algorithms. The descriptions of each algorithm are as follows:

1. A) Decision tree (DT)

A DT is a non-parametric technique that classifies a data set based on the problem’s predictive structure. It produces a classification tree for categorical variables. Decision trees are highly interpretable, efficiently capture nonlinear relationships, handle both categorical and numerical features, relatively robust to outliers and noisy data, handle missing values by utilizing surrogate splits or imputation techniques, and can handle large datasets efficiently [60]. However, DT also has limitations as they can be prone to over fitting, struggle with capturing certain complex relationships that require more sophisticated algorithms, and can be sensitive to small changes in the data, leading to different tree structures.

1. B) Random forest (RF)

RF is a type of supervised ML that can be used for classification, regression, and dimension reduction purposes. It is a versatile algorithm used for huge amounts of data and overcoming noise. RF is preferred when improved predictive performance, reduced bias, reduction of variance, robustness to noise and outliers, feature importance, and handling high-dimensional data are important considerations for the problem at hand [34, 61]. However, RF has some limitations. It can be a black-box model, making it less interpretable or more difficult to interpret compared to individual DT; the ensemble nature of random forests makes it challenging to trace the decision-making process. Additionally, RF may not perform well on datasets with strong linear relationships.

1. C) Extreme gradient boost (XG Boost)

XG boost is a DT-based ensemble machine learning algorithm working by a gradient boosting framework. Boosting involves combining weak classifiers to produce a powerful averaged classifier, and it is also a variance reduction technique. It can be applied to both classification and prediction problems. XG boost is preferred because of robust to noisy data and outliers, handle high-dimensional datasets, control model complexity and prevent over fitting, handle missing values in the data, saves computational resources, and provides a wide range of hyper parameters [62]. However, XG boost may have higher computational and memory requirements and it also tends to be less interpretable compared to the other algorithms.

1. D) Light gradient boosting machine (LGM boost)

Light GBM is a gradient-boosting framework that works by combining multiple learners usually DT to create a strong predictive model and reduce memory usage. Light GBM is generally faster and more memory-efficient, making it suitable for large datasets than XG boost [63]. Light GBM is preferred when efficiency, scalability, handling high-dimensional data, handling categorical features, advanced boosting techniques, regularization techniques, feature importance, handling imbalanced datasets, and flexibility are important considerations for the problem at hand.

1. E) Support vector machine (SVM)

SVM is a set of supervised learning methods used for classification, regression, and outlier detection. SVMs is preferred when dealing with high-dimensional spaces, robustness to outliers, nonlinearity, margin maximization, memory efficiency, and small to medium-sized datasets are important considerations for the problem at hand [64]. However, SVMs may have limitations in terms of scalability to large datasets and computational efficiency, especially when using non-linear kernels. Besides, SVMs may not perform well when the dataset is imbalanced, or when the classes are overlapping and not well-separated. Moreover, SVM do not provide probability estimates directly but through an expensive five-fold cross-validation process

1. F) Logistic regression (LR)

LR is a supervised ML algorithm used to solve classification issues. It is a parametric method that assumes a Bernoulli distribution of the target variable and the independence of the observations [64].

1. G) K-nearest Neighbor (KNN)

KNN is a non-parametric, robust, and adaptable supervised ML primarily used for classification problem. This approach keeps track of all existing cases and categorizes new ones using a similarity score with a distance function and the majority vote of its neighbors. KNN is preferred when dealing with nonlinear relationships, interpretability, robustness to outliers, handling imbalanced datasets, no explicit training step, flexibility, and datasets with varying densities are important considerations for the problem at hand [65]. However, KNN has limitations. It can be computationally expensive, especially when dealing with large datasets or high-dimensional feature spaces. Besides, KNN is sensitive to the choice of the distance metric, and the optimal value of K needs to be determined through experimentation or cross-validation.

1. H) Gaussian Naïve Bayes (GNB)

NB is a collection of ML algorithms built based on Bayes theorem which has two basic assumptions. The first one is every pair of features should be independent of each other and the second assumption is the feature must have an equal contribution to the outcome prediction. GNB is preferred when efficiency, simplicity, handling continuous features, small training sets, text classification, and the feature independence assumption are important considerations for the problem at hand [66]. However, GNB may not perform well in cases where the two assumptions are severely violated. It may struggle with datasets where the features have strong dependencies or when the decision boundary is complex.

Model training and evaluation

After dividing the data into training and testing sets, we selected appropriate models for training, focusing on classifiers suitable for the categorical target variable. The dataset involved binary classification for anemia, so we utilized eight machine learning algorithms. These choices were based on previous research using machine learning techniques on EDHS data, the type of problem, and data nature or characteristics.

Following model selection, we trained the selected classifiers with both balanced and unbalanced data. The best predictive model was then chosen and trained with balanced training data for the final prediction on unseen test data. To evaluate the performance of the final model, we used a confusion matrix and receiver operating characteristic (ROC) curve with metrics such as accuracy, recall, specificity, F1 score, and area under the curve (AUC). The AUC was considered the main performance metric, providing an overall assessment of the model’s performance at different classification thresholds. The confusion matrix allowed us to extract one-dimensional performance metrics such as True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) [47].

When selecting an evaluation metric, we have considered contextual requirements, metric trade-offs, field benchmarks, model interpretability, problem type, data characteristics, and task goals [67–69]. In addition to standard metrics, the model’s performance was further evaluated using tenfold cross-validation. This technique involves dividing the data into ten subsets and training and evaluating the model ten times with different combinations of subsets [70]. The study also conducted a comprehensive exploration of hyper parameters to optimize the model’s performance. Grid search, random search, and Bayesian optimization methods were systematically utilized to find the most effective hyper parameter configurations, considering factors such as search space size, computational resources, and exploration-exploitation balance. Grid search is exhaustive but computationally expensive, random search is less intensive but may require more iteration, and Bayesian optimization is efficient for complex search spaces but requires additional setup and resources. The choice of method depends on the specific algorithm and dataset characteristics, and experimentation on the validation set is recommended for tuning [71]. The authors considered the advantages of each technique and selected the best tuning approach based on performance metrics. Moreover, calibration was performed to enhance the precision and reliability of the model, resulting in improved prediction accuracy.

Association rule mining

In this study, association rule analysis was applied through the Apriori algorithm (arules package) via R software (version 4.3.2) to identify a specific category of predictor variables that have associations with stunting. The association rule mining analysis (the If (antecedent)/ then (consequent) statements) was used to discover relationships between seemingly relational attributes, especially for the categorical nature of attributes because the ML algorithms do not show which category is more associated with stunting among adolescent girls in Ethiopia. It is important to observe frequently occurring patterns and identify the dependencies between attributes by supporting how frequently the if/then relationship appears in the observations and confidence in the number of times the relationships are true [72]. The If then association rule is the pair of X and Y (X, Y) attributes expressed as X->Y, where X is an antecedent and Y a consequent that is as X happens Y would also happen. The relationship between X and Y attributes is expressed in the following way [73]. As lift is equal to 1, it shows that X and Y appearing at the same time belong to independent random events and have no special significance; we call these rules an uncorrelated rule. If the lift value is less than 1, it shows that the occurrence of “X” reduces the occurrence of “Y,” and then we call them negative correlation rules. If the lift value is greater than 1, it shows that the occurrence of “X” promotes the occurrence of “Y,” and then we call them positive correlation rules.

Model interpretability

Researchers have highlighted the inclusion of SHAP (SHapley Additive exPlanations) values to comprehend how different features influence the model’s predictions, SHAP analysis emerges as a more appropriate choice [74, 75]. In order to comprehensively understand the data and analyze the factors impacting stunting prediction, we employed various techniques. Firstly, we utilized average SHAP values to evaluate the overall impact of each feature on the model’s predictions. This approach provided insights into the relative importance of different variables. SHAP analysis is a commonly used method in machine learning for interpreting predictions and understanding feature importance. It assigns a numerical value, known as a SHAP value, to each feature, indicating its contribution to predictions. Calculating SHAP values allows practitioners to gain insights into how features influence predictions. Positive values indicate a positive contribution, negative values indicate the opposite, and the magnitude represents the strength of influence. SHAP analysis enhances transparency and interpretability, providing a global view of feature importance and explaining individual predictions [76–78]. Afterward, we employed a waterfall plot to visually depict the cumulative effects of these variables, emphasizing their contributions to the overall prediction [79].

Ethical considerations

The central statistical agency (CSA) received the ethical clearance for the 2016 EDHS survey from the Ethiopian Health and Nutrition Research Institute Review Board and the National Research Ethics Review Committee at the Ministry of Science and Technology. Moreover, they confirmed that their research has been performed in accordance with the declaration of Helsinki and the CSA obtained written informed consent from the respondents. The authors received a permission letter to download and use the data set for the current study from the DHS program data archivist upon submission of a proposal. After data access was authorized by DHS we maintained confidentiality and used only for the study purpose.

Results

Study participant characteristics

A total of weighted 3156 adolescent girls were included in the final analysis. Of these, the prevalence of stunting was found to be 452 (14.32%) with 95% CI; 13.12, 15.59. The majority (60.17%) of the respondents were in the age group of 15–17 years. Nearly two-thirds (64.5%) of the adolescent girls were rural dwellers and 58.24% had completed primary education. Four hundred two (12.74%) and one-hundred fifty-three (4.85%) adolescent girls were from Addis Ababa and Harari regions, respectively. The majority (69.8%) of the adolescent girls did not have access to improved toilet facilities and 44.3% had no media exposure. The majority (89.6%) of the respondents had no child and four out of ten (41.13%) respondents had four to six family sizes (see Table 1).

[Figure omitted. See PDF.]

Machine learning analysis of stunting among adolescent girls

Data balancing.

We had employed four data balancing techniques such as under-sampling, over-sampling, SMOTE, and ADASYN, and their performance was assessed using an accuracy and AUC value. The balancing techniques that demonstrated high performance were considered as best balancing technique for the final prediction. In terms of unbalanced data, the LGB achieved an AUC value of 64% and among the four balancing technique, the RF algorithm outperformed than the other algorithms with an AUC value of 84%. Considering all the data balancing techniques, SMOTE stood out as the superior method. Table 2 depicted a comparison of different data balancing techniques with an AUC and accuracy value. Moreover, the prevalence of stunting before and after data balancing was reported (Fig 2).

[Figure omitted. See PDF.]

Features selection using Boruta algorithms.

The important features from the data set were selected by using the Boruta algorithm, which classifies the independent variables as either important or unimportant based on their impact on the stunting status. The algorithm identified the most influential confirmed key features that can explain the variation in stunting status and recommended them for further analysis and modeling. On the other hand, variables that were rejected are considered less important and were excluded from further analysis, as they have been determined to have minimal impact on the outcome of interest. The Boruta algorithm graph showed the confirmed (important) variables with green color. The rejected (unimportant) variables were represented with red color [52]. From a total of 14 features, the three features namely family size, marital status, and source of drinking water were considered unimportant and the rest 11 features are important for model prediction (Fig 3).

[Figure omitted. See PDF.]

Note: V024-region, V130-religion, V190-wealth index, V116-toilet facility type, V312-contraceptive method, V025-residence, V157-media exposure, V106-educational status, V218-number of children, V501-marrital status, V717-occupation, V113-source of drinking water, V136-family size, and V012-age of respondent.

Model development and performance evaluation to predict stunting.

Performance metrics such as accuracy, precision, recall, F1 score, specificity, and AUC value were used to evaluate and compare the algorithms’ performance (Fig 4). These metrics assessed the overall correctness, ability to correctly predict positive and negative instances, and the algorithm’s discriminative power. By utilizing these performance metrics, the researchers conducted a comprehensive evaluation to determine how effectively the algorithms could predict stunting among adolescent girls in Ethiopia. After comparing the performance metrics of the three tuning techniques we found that the grid search was the best tuning technique (Table 3). Based on the ROC curve analysis result, the top three ML algorithms for classifying stunting status were found to be the random forest, the light gradient, and the extreme gradient boosting classifier (Fig 5).

[Figure omitted. See PDF.]

Model interpretability

SHAP value interpretation.

The mean SHAP value report offered valuable insights into the comparative significance of various features in the classification model. Region, respondent age, educational status, and wealth index were identified as the most influential factors, exerting a substantial impact on the model’s predictions. Besides, assume the baseline log odds for stunting without considering the region is 0. A SHAP value of +0.04 suggests that the region and respondent age increases the log odds of stunting. This means that the presence of this region and respondent age increases the probability of stunting to around 51%. Conversely, the contraceptive use and residence displayed minimal influence and limited value in predicting the model’s outcome on the classification outcome, as evidenced by their low mean SHAP values. A SHAP value of +0.01 for residence and contraceptive use indicates a very slight increase in the likelihood of stunting, suggesting that while these factors have some relevance, their impact is minimal compared to other potential factors influencing child growth. Moreover, A SHAP value of +0.01 implies that both residence and contraceptive use have a small positive effect on the likelihood of stunting. This indicates that the likelihood of stunting increases slightly for about 50.25% (Fig 6).

[Figure omitted. See PDF.]

The waterfall plot provided valuable insights into the hierarchy of feature importance when predicting the target variable. The plot highlighted that region, educational status, and respondent age had the highest positive impact on the prediction. A SHAP value of +0.07 is relatively substantial, indicating that the region plays a noteworthy role in increasing the likelihood of stunting. This suggests that the probability of stunting increases to about 54.2% due to the regional factor. Contrariwise, number of children, media exposure and respondent occupation had had a negative contribution to the model’s prediction meaning that they had a decreasing effect on the prediction. A SHAP value of -0.02 suggests that having more children has a negative effect on the likelihood of stunting. This indicates that the probability of stunting decreases slightly for about 52% in families having fewer children compared to those who have more number of children (Fig 7).

[Figure omitted. See PDF.]

Association rule mining.

The association rule mining was applied and a total of 29 association rules were built. Among these, the authors selected the top 7 best rules based on their confidence and lift value [80], which is an interesting quality measurement criterion of the association rule. To identify all potential association rules, the minimal support degree was set at 0.0001 and the minimum confidence threshold at 80% because a rule is considered reliable if its confidence level is more than 80% [81]. These factors significantly influence the probability of stunting and should be considered in initiatives aimed at enhancing adolescent health in the region. The top seven association rules are presented below:

1. Rule-1: If adolescent girls age = 1(aged 18-19yrs), religion = 1(orthodox), wealth index = 0(poor), number of children = 1 (at least 1 and above child), and current contraceptive use = 0 (not used), Then the probability of being stunted is 95.8% (lift = 6.9).

2. Rule-2: If adolescent girls age = 1 (aged 18-19yrs), education level = 0 (no formal education), religion = 1 (orthodox), wealth index = 0(poor), current contraceptive use = 0 (not used), Then the likelihood of being stunted is 95.8% (lift 6.9).

3. Rule-3: If region = 3 (Amhara), wealth index = 0 (poor), numbers of children = 1(at least 1 and above child), contraceptive use = 0 (not used), media exposure = 0(no), Then the likelihood of stunting is 83.3% (lift = 5.8).

4. Rule-4: If region = 10 (Addis Ababa), type of toilet facility = 0(unimproved), number of children = 1(at least 1 and above child), Then the likelihood of stunting is 80% (Lift = 5.6).

5. Rule-5: If region = 2 (Afar), type of toilet facility = 0(unimproved), respondent occupation = 1(working), media exposure = 0(no), Then the likelihood of stunting is 80% (Lift = 5.6)

6. Rule-6: If region = 1(Tigray), residence = 2(rural), type of toilet facility = 0 (unimproved), number of children = 1(at least 1 and above child). Then the probability of stunting would be 80% (lift = 5.6)

7. Rule-7: If respondent age = 1 (aged 18-19yrs), region = 1(Tigray), type of toilet facility = 0(unimproved), number of children = 1(at least 1 and above child), Then the likelihood of being stunted is 80%, (lift = 5.6)

Discussion

The findings of this research demonstrated the potential of ML algorithms in predicting the presence of stunting among adolescent girls in Ethiopia. This opens up opportunities for the development of automated screening tools and decision support systems that can assist healthcare providers in diagnosing and managing stunting. We have utilized eight different ML algorithms, namely RF, DT, GNB, KNN, LGB, XGB, SVM, and LR to assess and compare their predictive capabilities. Evaluating their performance we found that all eight algorithms achieved ROC values above the optimal threshold and the RF algorithm performed better than all others, with an accuracy of 77% and an AUC value of 95%. Although there were slight differences in the metrics values due to socio-economic, size of data, and study area variations, the finding of the current study was similar with studies conducted in Bangladesh [82], Zambia [64], and Ethiopia [57], which also found that the RF model was the best than the other algorithm. This similarity might be due to the nature of the features included across different studies since the RF algorithm demonstrated better performance in categorical variables, high dimensions and non-linear trends, and requires minimal effort for tuning hyper-parameters [82]. The use of RF classifier for predicting stunting has implications by providing accurate predictive models, insights into risk factors and mechanisms, identification of vulnerable subgroups, and the potential for integrating machine learning into healthcare systems. These implications pave the way for targeted interventions, personalized healthcare approaches, and improved health outcomes for individuals affected by stunting. However, our study was incongruent with a studies conducted in Papua NewGuinea [25], Ethiopia [35], and Rwanda [83], which showed that the XGB has a superior performance compared to all other algorithms. This discrepancy might be due to the nature of data, population characteristics, different evaluation metrics criteria used across studies, and research bias arising in the sample size, data preprocessing techniques, feature engineering that affect the observed results and model prediction [30].

Another aim of this study was to identify the top predictors of stunting among adolescent girls. To accomplish this, the author utilized the Boruta algorithm to select important features. Out of a total of 14 features included based on the literature; the study identified the top 11 predictors of stunting. The Boruta algorithm revealed that region, religion, type of toilet facility, wealth index, educational level, and current contraceptive use, and residence, age of respondent, number of children, occupational status, and media exposure were found to be the top 11 predictors of stunting among adolescent girls in Ethiopia.

The finding obtained from analyzing the mean SHAP value report and waterfall plot provided valuable insights regarding the importance of different factors in a classification model used to predict stunting. Factors such as region, respondent age, wealth index and educational status were found to have a significant impact on the model’s predictions and emerged as the most influential features. Understanding that certain regions have a higher SHAP value for stunting can guide policymakers and health professionals to focus interventions in these areas and programs aimed at improving nutrition, health education, and resources could be prioritized in regions identified as having higher risks.

On the other hand, the contraceptive use and residence had minimal influence on the classification outcome, as indicated by their low mean SHAP values. Understanding the significance of these features and their influence on the model’s predictions can serve as a valuable guide for targeted interventions and policy decisions, ultimately leading to improvements in the health, nutrition, and well-being of adolescent girls in Ethiopia. These insights not only validate existing domain knowledge but also evaluate the effectiveness of the model, resulting in more accurate and impactful interventions related to adolescent women health in the region to tackle the intergenerational cycle of the burden of malnutrition. Although the SHAP value is positive, the influence of these factors is slight. This suggests that while residence and contraceptive use are important, they may not be the primary drivers of stunting. Addressing stunting may require a broader approach that combines media campaigns with direct nutritional support, healthcare access, and education on adolescent health practices.

The third objective of the study was to use association rule mining (ARM) with the a priori algorithm. The top seven rules generated by the best model revealed that being 18–19 years old, having a poor wealth index, having no formal education, using an unimproved toilet facility, living in a rural area, having one or more children, not using contraceptive methods, residing in certain regions (Afar, Tigray, Amhara), and having no media exposure were most frequently associated with a high probability of stunting. This finding was supported by previous literatures [25, 57, 83]. Therefore, identifying patterns of features has wide-ranging implications across health and wellbeing, enabling data-driven decision making, improving nutritional status of adolescent, optimizing processes, and supporting research efforts.

The ARM findings indicated that an adolescent girl aged 18–19 are at a higher risk of stunting compared to those aged 15–17, which is in line with research conducted in India [39, 84], this could be attributed to the fact that stunting reflects prolonged exposure to nutrient deficiencies, which may manifest later in life. Additionally, older adolescents are at an increased risk of pregnancy, leading to competition for nutrients with their growing fetus and resulting in various macro and micronutrient deficiencies. Moreover, adolescent girls with poor wealth index were more likely to be stunted compared with their counterparts. This was supported by studies conducted in Ethiopia [20, 85] and Turkey [86]. This might be because adolescent girls from poor wealth index cannot easily access and afford balanced nutrition, and get nutrition-related information from the media [85].

The likelihood of stunting among adolescent girls in rural areas was found to be higher compared to their urban counterparts, which is consistent with previous research conducted in Ethiopia [20, 85, 87]. This could be attributed to the limited access to healthcare services and lack of exposure to immunization, nutrition information, and educational campaigns in rural areas. Additionally, adolescent girls from the regions of Afar, Tigray, and Amhara were more likely to be stunted compared to girls from other regions, which aligns with a previous study in Ethiopia [40], that identified statistically significant hot spot areas in these regions. This could be linked to seasonal drought and decreased rainfall, which pose challenges for crop production in these regions.

Adolescent girls who have given birth to one or more children were found to have a higher likelihood of being stunted compared to those who have not given birth, which is in line with previous research [20, 88]. This could be attributed to the increased nutritional demands during adolescence, as there is significant competition for nutrients between the still-growing adolescent mother and her rapidly developing fetus. This competition may lead to compromised growth and development for both the mother and the fetus [89]. Besides, the presence of unimproved toilet facilities was found to increase the likelihood of stunting in comparison to adolescents with access to improved latrines, which aligns with previous research [20, 90]. This may be because unimproved toilets can lead to parasitic infections, which are a common cause of malnutrition.

The if/ then rules are critical to discovering hidden relationships between attributes, extracting knowledge from a set of data, and accurately representing knowledge and information about stunting. The current study findings are critically important for policymakers and stakeholders to support public health action, decision-making purposes, and the storage of knowledge regarding adolescent nutritional status. Strategies targeting the identified features should be emphasized.

The practical significance of this study lies in its ability to aid in early detection, provide targeted prevention strategies, and guide personalized interventions, and influence resource allocation and policymaking. These implications have the potential to greatly enhance the health outcomes of adolescent girls in Ethiopia by effectively addressing stunting and reducing its impact on individuals, families, and the healthcare system. As a result, this study introduces new perspectives to the field of stunting intervention among adolescent girls through its innovative approach, identification of key risk factors, development of accurate prediction models, and proposal of personalized interventions. These contributions provide valuable information for policymakers and program planners and offer insightful guidance for designing focused interventions to improve the health outcomes of adolescent girls in Ethiopia and to break the intergenerational cycle of the burden of malnutrition.

Strength and limitations of the study

The study incorporates eight supervised ML classification algorithms and association rule mining providing a comprehensive and robust analysis of the predictive capabilities of different algorithms in order to reveals hidden patterns and relationships in the data that may not be easily identifiable through traditional statistical methods.

The analysis relied on secondary data from the DHS; crucial clinical, household food security and dietary factors were not taken into account. Consequently, it is crucial for future researchers to incorporate these variables into their datasets when predicting stunting, using sources other than the DHS. The 2016 dataset may not adequately represent current conditions or trends, limiting the applicability of our findings to today’s context. Therefore, future researchers shall use the recent dataset to better predict stunting and provide up-to-date evidences. Besides, the challenges of applying continuous-data methods or machine learning algorithms to discrete variables are also another limitation of our study. Therefore, adapting machine learning algorithms and developing new methods to handle discrete variables effectively and use a complex large data set to utilize generative AI and deep learning are an active area of research in the field. Furthermore, it is important to note that the results of our study may not be generalizable to different populations or age groups, as our investigation specifically targeted adolescent girls in Ethiopia. Therefore, future research should aim to examine stunting classification and prediction across a variety of demographic groups. Besides, biases or limitations could arise from the Boruta feature selection method, only DHS data set used, and limited algorithms included. Therefore, it would be valuable for future research to explore the classification and prediction of stunting using many more algorithms, feature selection methods and multiple data sources to address these limitations and to investigate additional areas that can enhance our understanding of stunting in this population, ultimately guiding more effective interventions and policies.

Conclusion and implication of the study

This research indicates that the random forest classifier performed better than other algorithms based on performance evaluation metrics. Factors such as region, age, poor wealth index, unimproved toilet facility, rural residence, having children, lack of media exposure, lack of formal education, occupation, religion, and not using contraceptive methods were found to be the top 11 important features in predicting stunting. Therefore, in addition to current efforts to address childhood stunting, nutrition interventions should focus on adolescents as a key target group to break the cycle of malnutrition across generations. More studies using different machine learning algorithms are needed to explore dietary patterns and nutrient intake about adolescent stunting using datasets other than DHS. It is crucial for national strategies to prioritize stunting to reach the most vulnerable individuals in the poorest households, including those with more children and those in the most affected regions. The findings also suggest that interventions to address adolescent stunting in Ethiopia should focus on promoting contraceptive use, increasing media access, reducing poverty, and improving sanitation. Additionally, healthcare workers should screen the nutritional status of adolescent girls and assess risk factors for stunting, with particular emphasis on those living in rural areas, in the poorest wealth quintile, and without access to hygienic toilets.

Our study can have a significant impact on addressing stunting in developing countries. It can enable early detection and diagnosis by analyzing stunting-related data, facilitate remote monitoring and telemedicine to overcome healthcare access limitations, optimize treatment strategies based on patient data, aid in public health planning and resource allocation, recommend personalized interventions, and support data-driven research and policy development [91]. However, successful implementation requires addressing challenges such as data availability, healthcare infrastructure, ethical considerations, and model biases [92]. With proper attention to these challenges, the current study can improve stunting management and outcomes in developing countries.

Policymakers and healthcare providers can use these identified potential factors as indicators to create interventions that meet the specific needs of different subgroups in the population. This tailored approach can enhance the health and nutritional status of adolescent girls and reduce the burden of stunting in areas with limited resources.

Supporting information

S1 File. Stunting final dataset.

https://doi.org/10.1371/journal.pone.0316452.s001

(DTA)

Acknowledgments

This study was based on data from the DHS Program and the authors would like to extend their deepest gratitude to the DHS program data archivist and the Ethiopian CSA.

References

1. 1. World Health Organization (WHO), Health for the World’s Adolescents: A Second Chance in the Second Decade. Geneva, World Health Organization Department of Noncommunicable disease surveillance. 2022.

2. 2. Gupta, M.D., The power of 1.8 billion: Adolescents, youth and the transformation of the future. 2014: United Nations Population Fund.

3. 3. Christian P. and Smith E.R., Adolescent undernutrition: global burden, physiology, and nutritional risks. Annals of Nutrition and Metabolism, 2018. 72(4): p. 316–328. pmid:29730657

* View Article

* PubMed/NCBI

* Google Scholar

4. 4. Unicef, Progress for children: A report card on adolescents. New York: UNICEF, 2012(10).

5. 5. Prentice A.M., et al., Reply to JL Leroy et al. The American journal of clinical nutrition, 2013. 98(3): p. 856–857. pmid:24137695

* View Article

* PubMed/NCBI

* Google Scholar

6. 6. Organization, W.H., Global nutrition targets 2025: Stunting policy brief. 2014, World Health Organization.

7. 7. Black R.E., et al., Maternal and child undernutrition and overweight in low-income and middle-income countries. The lancet, 2013. 382(9890): p. 427–451. pmid:23746772

* View Article

* PubMed/NCBI

* Google Scholar

8. 8. Prentice A.M., et al., Critical windows for nutritional interventions against stunting. The American of Clinical Nutrition, 2013. 97(5): p. 911–918. pmid:23553163

* View Article

* PubMed/NCBI

* Google Scholar

9. 9. Organization, W.H., Guideline: implementing effective actions for improving adolescent nutrition. 2018. World Health Organization: Geneva, Switzerland, 2019.

10. 10. McGuire S., World Health Organization. Comprehensive implementation plan on maternal, infant, and young child nutrition. Geneva, Switzerland, 2014. Advances in Nutrition, 2015. 6(1): p. 134–135.

* View Article

* Google Scholar

11. 11. Rengma M.S., Bose K., and Mondal N., Socio-economic and demographic correlates of stunting among adolescents of Assam, North-east India. AnthropologicAl review, 2016. 79(4): p. 409–425.

* View Article

* Google Scholar

12. 12. Kwon E.J. and Kim Y.J., What is fetal programming?: a lifetime health is under the control of in utero health. Obstetrics & gynecology science, 2017. 60(6): p. 506–519. pmid:29184858

* View Article

* PubMed/NCBI

* Google Scholar

13. 13. Melaku Y.A., et al., Prevalence and factors associated with stunting and thinness among adolescent students in Northern Ethiopia: a comparison to World Health Organization standards. Archives of Public Health, 2015. 73: p. 1–11.

* View Article

* Google Scholar

14. 14. Aguayo V.M. and Paintal K., Nutrition in adolescent girls in South Asia. bmj, 2017. 357. pmid:28400363

* View Article

* PubMed/NCBI

* Google Scholar

15. 15. Berhe K., et al., Prevalence and associated factors of adolescent undernutrition in Ethiopia: a systematic review and meta-analysis. BMC nutrition, 2019. 5(1): p. 1–13. pmid:32153962

* View Article

* PubMed/NCBI

* Google Scholar

16. 16. CSA-Ethiopia, I., International. Ethiopia Demographic and Health Survey 2016: Key Indicators Report. Rockville: CSA and ICF, 2016.

17. 17. Organization, W.H., Adolescent nutrition: a review of the situation in selected South-East Asian Countries. Adolescent nutrition: a review of the situation in selected South-East Asian Countries, 2006.

18. 18. Bishwajit G., Nutrition transition in South Asia: the emergence of non-communicable chronic diseases. F1000Research, 2015. 4. pmid:26834976

* View Article

* PubMed/NCBI

* Google Scholar

19. 19. Wassie M.M., et al., Predictors of nutritional status of Ethiopian adolescent girls: a community based cross sectional study. BMC nutrition, 2015. 1(1): p. 1–7.

* View Article

* Google Scholar

20. 20. Abate B.B., et al., Prevalence and determinants of stunting among adolescent girls in Ethiopia. Journal of pediatric nursing, 2020. 52: p. e1–e6. pmid:32029327

* View Article

* PubMed/NCBI

* Google Scholar

21. 21. Dewey K.G. and Begum K., Long‐term consequences of stunting in early life. Maternal & child nutrition, 2011. 7: p. 5–18. pmid:21929633

* View Article

* PubMed/NCBI

* Google Scholar

22. 22. Nur R.F., et al., Reducing stunting rates through intervention for adolescent girls and pregnant women’s nutrition. Dinasti International Journal of Education Management and Social Science, 2023. 5(1): p. 29–33.

* View Article

* Google Scholar

23. 23. Prendergast A.J. and Humphrey J.H., The stunting syndrome in developing countries. Paediatrics and international child health, 2014. 34(4): p. 250–265. pmid:25310000

* View Article

* PubMed/NCBI

* Google Scholar

24. 24. Handiso Y.H., et al., Undernutrition and its determinants among adolescent girls in low land area of Southern Ethiopia. PLoS One, 2021. 16(1): p. e0240677. pmid:33434212

* View Article

* PubMed/NCBI

* Google Scholar

25. 25. Shen H., Zhao H., and Jiang Y., Machine learning algorithms for predicting stunting among under-five children in Papua New Guinea. Children, 2023. 10(10): p. 1638. pmid:37892302

* View Article

* PubMed/NCBI

* Google Scholar

26. 26. Caleyachetty R., et al., The double burden of malnutrition among adolescents: analysis of data from the Global School-Based Student Health and Health Behavior in School-Aged Children surveys in 57 low-and middle-income countries. The American journal of clinical nutrition, 2018. 108(2): p. 414–424. pmid:29947727

* View Article

* PubMed/NCBI

* Google Scholar

27. 27. Ye Y., et al., Comparison of machine learning methods and conventional logistic regressions for predicting gestational diabetes using routine clinical data: a retrospective cohort study. Journal of diabetes research, 2020. 2020. pmid:32626780

* View Article

* PubMed/NCBI

* Google Scholar

28. 28. Sufriyana H., et al., Comparison of multivariable logistic regression and other machine learning algorithms for prognostic prediction studies in pregnancy care: systematic review and meta-analysis. JMIR medical informatics, 2020. 8(11): p. e16503. pmid:33200995

* View Article

* PubMed/NCBI

* Google Scholar

29. 29. Churpek M.M., et al., Multicenter comparison of machine learning methods and conventional regression for predicting clinical deterioration on the wards. Critical care medicine, 2016. 44(2): p. 368–374. pmid:26771782

* View Article

* PubMed/NCBI

* Google Scholar

30. 30. Sarker I.H., Machine learning: Algorithms, real-world applications and research directions. SN computer science, 2021. 2(3): p. 160. pmid:33778771

* View Article

* PubMed/NCBI

* Google Scholar

31. 31. dos Santos B.S., et al., Data mining and machine learning techniques applied to public health problems: A bibliometric analysis from 2009 to 2018. Computers & Industrial Engineering, 2019. 138: p. 106120.

* View Article

* Google Scholar

32. 32. Haneef R., et al., Use of artificial intelligence for public health surveillance: a case study to develop a machine Learning-algorithm to estimate the incidence of diabetes mellitus in France. Archives of Public Health, 2021. 79: p. 1–13.

* View Article

* Google Scholar

33. 33. Talukder A. and Ahammed B., Machine learning algorithms for predicting malnutrition among under-five children in Bangladesh. Nutrition, 2020. 78: p. 110861. pmid:32592978

* View Article

* PubMed/NCBI

* Google Scholar

34. 34. Hemo S. and Rayhan M., Classification tree and random forest model to predict under-five malnutrition in Bangladesh. Biom Biostat Int J, 2021. 10(3): p. 116–123.

* View Article

* Google Scholar

35. 35. Bitew F.H., Sparks C.S., and Nyarko S.H., Machine learning algorithms for predicting undernutrition among under-five children in Ethiopia. Public health nutrition, 2022. 25(2): p. 269–280. pmid:34620263

* View Article

* PubMed/NCBI

* Google Scholar

36. 36. Kebede S.D., et al., Prediction of contraceptive discontinuation among reproductive-age women in Ethiopia using Ethiopian Demographic and Health Survey 2016 Dataset: A Machine Learning Approach. BMC Medical Informatics and Decision Making, 2023. 23(1): p. 9. pmid:36650511

* View Article

* PubMed/NCBI

* Google Scholar

37. 37. Wado Y.D., Women’s autonomy and reproductive health-care-seeking behavior in Ethiopia. Women & health, 2018. 58(7): p. 729–743. pmid:28759344

* View Article

* PubMed/NCBI

* Google Scholar

38. 38. World Health Organization, WHO Child Growth Standards: Length/Height-for-Age, Weight-for-Age, Weight-for-Length, Weight-forHeight and Body Mass Index-for-Age: Methods and Development; World Health Organization: Geneva, Switzerland. 2006.

39. 39. Johnson A.R., Balasubramanya B., and Thimmaiah S., Stunting and its determinants among adolescents in four schools of Bangalore city: Height for age-a vital metric for nutritional assessment. Indian Journal of Community Health, 2022. 34(1): p. 111–117.

* View Article

* Google Scholar

40. 40. Derseh N.M., Gelaye K.A., and Muluneh A.G., Spatial patterns and determinants of undernutrition among late-adolescent girls in Ethiopia by using Ethiopian demographic and health surveys, 2000, 2005, 2011 and 2016: a spatial and multilevel analysis. BMC Public Health, 2021. 21: p. 1–20.

* View Article

* Google Scholar

41. 41. Stewart C.P., et al., Contextualising complementary feeding in a broader framework for stunting prevention. Maternal & child nutrition, 2013. 9: p. 27–45.

* View Article

* Google Scholar

42. 42. Fatema K. and Lariscy J.T., Mass media exposure and maternal healthcare utilization in South Asia. SSM-Population Health, 2020. 11: p. 100614. pmid:32596437

* View Article

* PubMed/NCBI

* Google Scholar

43. 43. Stevens G.A., et al., National, regional, and global estimates of anaemia by severity in women and children for 2000–19: a pooled analysis of population-representative data. The Lancet Global Health, 2022. 10(5): p. e627–e639. pmid:35427520

* View Article

* PubMed/NCBI

* Google Scholar

44. 44. World Health Organization & UNICEF, Core questions on drinking water and sanitation for household surveys https://www.who.int/water_sanitation_health/monitoring/oms_brochure_core_questionsfinal24608. accessed on September, 2023. 2006.

45. 45. Abd-Alrazaq A., et al., Patients’ adoption of electronic personal health records in England: Secondary data analysis. Journal of Medical Internet Research, 2020. 22(10): p. e17499. pmid:33026353

* View Article

* PubMed/NCBI

* Google Scholar

46. 46. Jonsson, P. and C. Wohlin. An evaluation of k-nearest neighbour imputation using likert data. in 10th International Symposium on Software Metrics, 2004. Proceedings. 2004. IEEE.

47. 47. Luque A., et al., The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 2019. 91: p. 216–231.

* View Article

* Google Scholar

48. 48. Setiawan B.D., Serdült U., and Kryssanov V., A machine learning framework for balancing training sets of sensor sequential data streams. Sensors, 2021. 21(20): p. 6892. pmid:34696105

* View Article

* PubMed/NCBI

* Google Scholar

49. 49. Chawla N.V., et al., SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 2002. 16: p. 321–357.

* View Article

* Google Scholar

50. 50. Brownlee, J., Data preparation for machine learning: data cleaning, feature selection, and data transforms in Python. 2020: Machine Learning Mastery.

51. 51. Rudnicki, W.R., M. Wrzesień, and W. Paja, All relevant feature selection methods and applications. Feature Selection for Data and Pattern Recognition, 2015: p. 11–28.

52. 52. Chen R.-C., et al., Selecting critical features for data classification based on machine learning methods. Journal of Big Data, 2020. 7(1): p. 52.

* View Article

* Google Scholar

53. 53. Pudjihartono N., et al., A review of feature selection methods for machine learning-based disease risk prediction. Frontiers in Bioinformatics, 2022. 2: p. 927312. pmid:36304293

* View Article

* PubMed/NCBI

* Google Scholar

54. 54. Kursa M.B., Jankowski A., and Rudnicki W.R., Boruta–a system for feature selection. Fundamenta Informaticae, 2010. 101(4): p. 271–285.

* View Article

* Google Scholar

55. 55. Pedregosa F., et al., Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 2011. 12: p. 2825–2830.

* View Article

* Google Scholar

56. 56. Ogallo, W., et al. Identifying factors associated with neonatal mortality in Sub-Saharan Africa using machine learning. in AMIA Annual Symposium Proceedings. 2020. American Medical Informatics Association.

57. 57. Fenta H.M., Zewotir T., and Muluneh E.K., A machine learning classifier approach for identifying the determinants of under-five child undernutrition in Ethiopian administrative zones. BMC Medical Informatics and Decision Making, 2021. 21(1): p. 1–12.

* View Article

* Google Scholar

58. 58. Maulana, Y.D.F., Y. Ruldeviyani, and D.I. Sensuse. Data mining classification approach to predict the duration of contraceptive use. in 2020 Fifth International Conference on Informatics and Computing (ICIC). 2020. IEEE.

59. 59. Tesfaye B., et al., Predicting skilled delivery service use in Ethiopia: dual application of logistic regression and machine learning algorithms. BMC medical informatics and decision making, 2019. 19(1): p. 1–10.

* View Article

* Google Scholar

60. 60. Lucy Lawrence, S., Predicting stunting status among children under five years: The case study of Tanzania. 2021, University of Rwanda.

61. 61. Jin, Z., et al. RFRSF: Employee turnover prediction based on random forests and survival analysis. in Web Information Systems Engineering–WISE 2020: 21st International Conference, Amsterdam, The Netherlands, October 20–24, 2020, Proceedings, Part II 21. 2020. Springer.

62. 62. Sheridan R.P., et al., Extreme gradient boosting as a method for quantitative structure–activity relationships. Journal of chemical information and modeling, 2016. 56(12): p. 2353–2360. pmid:27958738

* View Article

* PubMed/NCBI

* Google Scholar

63. 63. Rufo D.D., et al., Diagnosis of diabetes mellitus using gradient boosting machine (LightGBM). Diagnostics, 2021. 11(9): p. 1714. pmid:34574055

* View Article

* PubMed/NCBI

* Google Scholar

64. 64. Chilyabanyama O.N., et al., Performance of machine learning classifiers in classifying stunting among under-five children in Zambia. Children, 2022. 9(7): p. 1082. pmid:35884066

* View Article

* PubMed/NCBI

* Google Scholar

65. 65. Isnain A.R., Supriyanto J., and Kharisma M.P., Implementation of K-Nearest Neighbor (K-NN) Algorithm For Public Sentiment Analysis of Online Learning. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 2021. 15(2): p. 121–130.

* View Article

* Google Scholar

66. 66. Zhang, D. and D. Zhang, Bayesian classification. Fundamentals of Image Data Mining: Analysis, Features, Classification and Retrieval, 2019: p. 161–178.

67. 67. Hossin M. and Sulaiman M.N., A review on evaluation metrics for data classification evaluations. International journal of data mining & knowledge management process, 2015. 5(2): p. 1.

* View Article

* Google Scholar

68. 68. Vujović Ž., Classification model evaluation metrics. International Journal of Advanced Computer Science and Applications, 2021. 12(6): p. 599–606.

* View Article

* Google Scholar

69. 69. Naidu, G., T. Zuva, and E.M. Sibanda. A Review of Evaluation Metrics in Machine Learning Algorithms. in Computer Science On-line Conference. 2023. Springer.

70. 70. Xu Y. and Goodacre R., On splitting training and validation set: a comparative study of cross-validation, bootstrap and systematic sampling for estimating the generalization performance of supervised learning. Journal of analysis and testing, 2018. 2(3): p. 249–262. pmid:30842888

* View Article

* PubMed/NCBI

* Google Scholar

71. 71. Hossain M.R. and Timmer D., Machine learning model optimization with hyper parameter tuning approach. Global Journal of Computer Science and Technology, 2021. 21(D2): p. 7–13.

* View Article

* Google Scholar

72. 72. Molnar, C., Interpretable machine learning. 2020: Lulu. com.

73. 73. Li Q., et al., Mining association rules between stroke risk factors based on the Apriori algorithm. Technology and Health Care, 2017. 25(S1): p. 197–205. pmid:28582907

* View Article

* PubMed/NCBI

* Google Scholar

74. 74. Council, N., Frontiers in Massive Data Analysis. he National Academies Press. Washington, DC, 2013.

75. 75. Roberts M.E., Stewart B.M., and Tingley D., Navigating the local modes of big data. Computational social science, 2016. 51.

* View Article

* Google Scholar

76. 76. Mangalathu S., Hwang S.-H., and Jeon J.-S., Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Engineering Structures, 2020. 219: p. 110927.

* View Article

* Google Scholar

77. 77. Prendin F., et al., The importance of interpreting machine learning models for blood glucose prediction in diabetes: an analysis using SHAP. Scientific Reports, 2023. 13(1): p. 16865. pmid:37803177

* View Article

* PubMed/NCBI

* Google Scholar

78. 78. Kashifi M.T., Investigating two-wheelers risk factors for severe crashes using an interpretable machine learning approach and SHAP analysis. IATSS research, 2023. 47(3): p. 357–371.

* View Article

* Google Scholar

79. 79. Alshankati, K., et al., The use of machine learning models to predict PFS and OS outcomes from waterfall plots in randomized clinical trials (MAP-OUTCOMES). 2023, American Society of Clinical Oncology.

80. 80. Ju C., et al., A novel method of interestingness measures for association rules mining based on profit. Discrete Dynamics in Nature and Society, 2015. 2015.

* View Article

* Google Scholar

81. 81. Altaf W., Shahbaz M., and Guergachi A., Applications of association rule mining in health informatics: a survey. Artificial Intelligence Review, 2017. 47: p. 313–340.

* View Article

* Google Scholar

82. 82. Islam M.M., et al., Application of machine learning based algorithm for prediction of malnutrition among women in Bangladesh. International Journal of Cognitive Computing in Engineering, 2022. 3: p. 46–57.

* View Article

* Google Scholar

83. 83. Ndagijimana S., et al., Prediction of stunting among under-5 children in Rwanda using machine learning techniques. Journal of Preventive Medicine and Public Health, 2023. 56(1): p. 41. pmid:36746421

* View Article

* PubMed/NCBI

* Google Scholar

84. 84. Kumar P., et al., Associated factors and socio-economic inequality in the prevalence of thinness and stunting among adolescent boys and girls in Uttar Pradesh and Bihar, India. PloS one, 2021. 16(2): p. e0247526. pmid:33626097

* View Article

* PubMed/NCBI

* Google Scholar

85. 85. Assefa H., Belachew T., and Negash L., Socio-demographic factors associated with underweight and stunting among adolescents in Ethiopia. The Pan African Medical Journal, 2015. 20. pmid:26161175

* View Article

* PubMed/NCBI

* Google Scholar

86. 86. Özgüven I., et al., Evaluation of nutritional status in Turkish adolescents as related to gender and socioeconomic status. Journal of clinical research in pediatric endocrinology, 2010. 2(3): p. 111. pmid:21274324

* View Article

* PubMed/NCBI

* Google Scholar

87. 87. Yetubie M., et al., Socioeconomic and demographic factors affecting body mass index of adolescents students aged 10–19 in Ambo (a rural town) in Ethiopia. International journal of biomedical science: IJBS, 2010. 6(4): p. 321. pmid:23675209

* View Article

* PubMed/NCBI

* Google Scholar

88. 88. Hiebert, L., Adolescent Health: Parity and Nutritional Status among Married Women. What is unique for adolescents and what is similar to older women? 2016, Yale University.

89. 89. Kaplanoglu M., et al., Gynecologic age is an important risk factor for obstetric and perinatal outcomes in adolescent pregnancies. Women and Birth, 2015. 28(4): p. e119–e123. pmid:26205092

* View Article

* PubMed/NCBI

* Google Scholar

90. 90. Bangoura, A., Determinants of Childhood Stunting in Guinea: Further Analysis of Demographic and Health Survey 2012. 2018.

91. 91. Tamibmaniam J., et al., Proposal of a clinical decision tree algorithm using factors associated with severe dengue infection. PLoS One, 2016. 11(8): p. e0161696. pmid:27551776

* View Article

* PubMed/NCBI

* Google Scholar

92. 92. Tanner L., et al., Decision tree algorithms predict the diagnosis and outcome of dengue fever in the early phase of illness. PLoS neglected tropical diseases, 2008. 2(3): p. e196. pmid:18335069

* View Article

* PubMed/NCBI

* Google Scholar

Citation: Zemariam AB, Abate BB, Alamaw AW, Lake Es, Yilak G, Ayele M, et al. (2025) Prediction of stunting and its socioeconomic determinants among adolescent girls in Ethiopia using machine learning algorithms. PLoS ONE 20(1): e0316452. https://doi.org/10.1371/journal.pone.0316452

About the Authors:

Alemu Birara Zemariam

Roles: Conceptualization, Data curation, Formal analysis, Software, Writing – original draft, Writing – review & editing

E-mail: [email protected]

Affiliation: Department of Pediatrics and Child Health Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia

ORICD: https://orcid.org/0000-0001-8195-3011

Biruk Beletew Abate

Roles: Methodology, Writing – original draft

Affiliation: Department of Pediatrics and Child Health Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia

Addis Wondmagegn Alamaw

Roles: Formal analysis, Writing – original draft

Affiliation: Department of Emergency and Critical Care Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia

Eyob shitie Lake

Roles: Writing – original draft

Affiliation: Department of Midwifery, School of Midwifery, School of Midwifery, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia

Gizachew Yilak

Roles: Writing – review & editing

Affiliation: Department of Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia

Mulat Ayele

Roles: Data curation, Writing – original draft

Affiliation: Department of Midwifery, School of Midwifery, School of Midwifery, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia

Befkad Derese Tilahun

Roles: Methodology, Writing – original draft

Affiliation: Department of Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia

Habtamu Setegn Ngusie

Roles: Formal analysis, Writing – original draft

Affiliation: Department of Health Informatics, School of Public Health, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia

[/RAW_REF_TEXT]

References

1. World Health Organization (WHO), Health for the World’s Adolescents: A Second Chance in the Second Decade. Geneva, World Health Organization Department of Noncommunicable disease surveillance. 2022.

2. Gupta, M.D., The power of 1.8 billion: Adolescents, youth and the transformation of the future. 2014: United Nations Population Fund.

3. Christian P. and Smith E.R., Adolescent undernutrition: global burden, physiology, and nutritional risks. Annals of Nutrition and Metabolism, 2018. 72(4): p. 316–328. pmid:29730657

4. Unicef, Progress for children: A report card on adolescents. New York: UNICEF, 2012(10).

5. Prentice A.M., et al., Reply to JL Leroy et al. The American journal of clinical nutrition, 2013. 98(3): p. 856–857. pmid:24137695

6. Organization, W.H., Global nutrition targets 2025: Stunting policy brief. 2014, World Health Organization.

7. Black R.E., et al., Maternal and child undernutrition and overweight in low-income and middle-income countries. The lancet, 2013. 382(9890): p. 427–451. pmid:23746772

8. Prentice A.M., et al., Critical windows for nutritional interventions against stunting. The American of Clinical Nutrition, 2013. 97(5): p. 911–918. pmid:23553163

9. Organization, W.H., Guideline: implementing effective actions for improving adolescent nutrition. 2018. World Health Organization: Geneva, Switzerland, 2019.

10. McGuire S., World Health Organization. Comprehensive implementation plan on maternal, infant, and young child nutrition. Geneva, Switzerland, 2014. Advances in Nutrition, 2015. 6(1): p. 134–135.

11. Rengma M.S., Bose K., and Mondal N., Socio-economic and demographic correlates of stunting among adolescents of Assam, North-east India. AnthropologicAl review, 2016. 79(4): p. 409–425.

12. Kwon E.J. and Kim Y.J., What is fetal programming?: a lifetime health is under the control of in utero health. Obstetrics & gynecology science, 2017. 60(6): p. 506–519. pmid:29184858

13. Melaku Y.A., et al., Prevalence and factors associated with stunting and thinness among adolescent students in Northern Ethiopia: a comparison to World Health Organization standards. Archives of Public Health, 2015. 73: p. 1–11.

14. Aguayo V.M. and Paintal K., Nutrition in adolescent girls in South Asia. bmj, 2017. 357. pmid:28400363

15. Berhe K., et al., Prevalence and associated factors of adolescent undernutrition in Ethiopia: a systematic review and meta-analysis. BMC nutrition, 2019. 5(1): p. 1–13. pmid:32153962

16. CSA-Ethiopia, I., International. Ethiopia Demographic and Health Survey 2016: Key Indicators Report. Rockville: CSA and ICF, 2016.

17. Organization, W.H., Adolescent nutrition: a review of the situation in selected South-East Asian Countries. Adolescent nutrition: a review of the situation in selected South-East Asian Countries, 2006.

18. Bishwajit G., Nutrition transition in South Asia: the emergence of non-communicable chronic diseases. F1000Research, 2015. 4. pmid:26834976

19. Wassie M.M., et al., Predictors of nutritional status of Ethiopian adolescent girls: a community based cross sectional study. BMC nutrition, 2015. 1(1): p. 1–7.

20. Abate B.B., et al., Prevalence and determinants of stunting among adolescent girls in Ethiopia. Journal of pediatric nursing, 2020. 52: p. e1–e6. pmid:32029327

21. Dewey K.G. and Begum K., Long‐term consequences of stunting in early life. Maternal & child nutrition, 2011. 7: p. 5–18. pmid:21929633

22. Nur R.F., et al., Reducing stunting rates through intervention for adolescent girls and pregnant women’s nutrition. Dinasti International Journal of Education Management and Social Science, 2023. 5(1): p. 29–33.

23. Prendergast A.J. and Humphrey J.H., The stunting syndrome in developing countries. Paediatrics and international child health, 2014. 34(4): p. 250–265. pmid:25310000

24. Handiso Y.H., et al., Undernutrition and its determinants among adolescent girls in low land area of Southern Ethiopia. PLoS One, 2021. 16(1): p. e0240677. pmid:33434212

25. Shen H., Zhao H., and Jiang Y., Machine learning algorithms for predicting stunting among under-five children in Papua New Guinea. Children, 2023. 10(10): p. 1638. pmid:37892302

26. Caleyachetty R., et al., The double burden of malnutrition among adolescents: analysis of data from the Global School-Based Student Health and Health Behavior in School-Aged Children surveys in 57 low-and middle-income countries. The American journal of clinical nutrition, 2018. 108(2): p. 414–424. pmid:29947727

27. Ye Y., et al., Comparison of machine learning methods and conventional logistic regressions for predicting gestational diabetes using routine clinical data: a retrospective cohort study. Journal of diabetes research, 2020. 2020. pmid:32626780

28. Sufriyana H., et al., Comparison of multivariable logistic regression and other machine learning algorithms for prognostic prediction studies in pregnancy care: systematic review and meta-analysis. JMIR medical informatics, 2020. 8(11): p. e16503. pmid:33200995

29. Churpek M.M., et al., Multicenter comparison of machine learning methods and conventional regression for predicting clinical deterioration on the wards. Critical care medicine, 2016. 44(2): p. 368–374. pmid:26771782

30. Sarker I.H., Machine learning: Algorithms, real-world applications and research directions. SN computer science, 2021. 2(3): p. 160. pmid:33778771

31. dos Santos B.S., et al., Data mining and machine learning techniques applied to public health problems: A bibliometric analysis from 2009 to 2018. Computers & Industrial Engineering, 2019. 138: p. 106120.

32. Haneef R., et al., Use of artificial intelligence for public health surveillance: a case study to develop a machine Learning-algorithm to estimate the incidence of diabetes mellitus in France. Archives of Public Health, 2021. 79: p. 1–13.

33. Talukder A. and Ahammed B., Machine learning algorithms for predicting malnutrition among under-five children in Bangladesh. Nutrition, 2020. 78: p. 110861. pmid:32592978

34. Hemo S. and Rayhan M., Classification tree and random forest model to predict under-five malnutrition in Bangladesh. Biom Biostat Int J, 2021. 10(3): p. 116–123.

35. Bitew F.H., Sparks C.S., and Nyarko S.H., Machine learning algorithms for predicting undernutrition among under-five children in Ethiopia. Public health nutrition, 2022. 25(2): p. 269–280. pmid:34620263

36. Kebede S.D., et al., Prediction of contraceptive discontinuation among reproductive-age women in Ethiopia using Ethiopian Demographic and Health Survey 2016 Dataset: A Machine Learning Approach. BMC Medical Informatics and Decision Making, 2023. 23(1): p. 9. pmid:36650511

37. Wado Y.D., Women’s autonomy and reproductive health-care-seeking behavior in Ethiopia. Women & health, 2018. 58(7): p. 729–743. pmid:28759344

38. World Health Organization, WHO Child Growth Standards: Length/Height-for-Age, Weight-for-Age, Weight-for-Length, Weight-forHeight and Body Mass Index-for-Age: Methods and Development; World Health Organization: Geneva, Switzerland. 2006.

39. Johnson A.R., Balasubramanya B., and Thimmaiah S., Stunting and its determinants among adolescents in four schools of Bangalore city: Height for age-a vital metric for nutritional assessment. Indian Journal of Community Health, 2022. 34(1): p. 111–117.

40. Derseh N.M., Gelaye K.A., and Muluneh A.G., Spatial patterns and determinants of undernutrition among late-adolescent girls in Ethiopia by using Ethiopian demographic and health surveys, 2000, 2005, 2011 and 2016: a spatial and multilevel analysis. BMC Public Health, 2021. 21: p. 1–20.

41. Stewart C.P., et al., Contextualising complementary feeding in a broader framework for stunting prevention. Maternal & child nutrition, 2013. 9: p. 27–45.

42. Fatema K. and Lariscy J.T., Mass media exposure and maternal healthcare utilization in South Asia. SSM-Population Health, 2020. 11: p. 100614. pmid:32596437

43. Stevens G.A., et al., National, regional, and global estimates of anaemia by severity in women and children for 2000–19: a pooled analysis of population-representative data. The Lancet Global Health, 2022. 10(5): p. e627–e639. pmid:35427520

44. World Health Organization & UNICEF, Core questions on drinking water and sanitation for household surveys https://www.who.int/water_sanitation_health/monitoring/oms_brochure_core_questionsfinal24608. accessed on September, 2023. 2006.

45. Abd-Alrazaq A., et al., Patients’ adoption of electronic personal health records in England: Secondary data analysis. Journal of Medical Internet Research, 2020. 22(10): p. e17499. pmid:33026353

46. Jonsson, P. and C. Wohlin. An evaluation of k-nearest neighbour imputation using likert data. in 10th International Symposium on Software Metrics, 2004. Proceedings. 2004. IEEE.

47. Luque A., et al., The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 2019. 91: p. 216–231.

48. Setiawan B.D., Serdült U., and Kryssanov V., A machine learning framework for balancing training sets of sensor sequential data streams. Sensors, 2021. 21(20): p. 6892. pmid:34696105

49. Chawla N.V., et al., SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 2002. 16: p. 321–357.

50. Brownlee, J., Data preparation for machine learning: data cleaning, feature selection, and data transforms in Python. 2020: Machine Learning Mastery.

51. Rudnicki, W.R., M. Wrzesień, and W. Paja, All relevant feature selection methods and applications. Feature Selection for Data and Pattern Recognition, 2015: p. 11–28.

52. Chen R.-C., et al., Selecting critical features for data classification based on machine learning methods. Journal of Big Data, 2020. 7(1): p. 52.

53. Pudjihartono N., et al., A review of feature selection methods for machine learning-based disease risk prediction. Frontiers in Bioinformatics, 2022. 2: p. 927312. pmid:36304293

54. Kursa M.B., Jankowski A., and Rudnicki W.R., Boruta–a system for feature selection. Fundamenta Informaticae, 2010. 101(4): p. 271–285.

55. Pedregosa F., et al., Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 2011. 12: p. 2825–2830.

56. Ogallo, W., et al. Identifying factors associated with neonatal mortality in Sub-Saharan Africa using machine learning. in AMIA Annual Symposium Proceedings. 2020. American Medical Informatics Association.

57. Fenta H.M., Zewotir T., and Muluneh E.K., A machine learning classifier approach for identifying the determinants of under-five child undernutrition in Ethiopian administrative zones. BMC Medical Informatics and Decision Making, 2021. 21(1): p. 1–12.

58. Maulana, Y.D.F., Y. Ruldeviyani, and D.I. Sensuse. Data mining classification approach to predict the duration of contraceptive use. in 2020 Fifth International Conference on Informatics and Computing (ICIC). 2020. IEEE.

59. Tesfaye B., et al., Predicting skilled delivery service use in Ethiopia: dual application of logistic regression and machine learning algorithms. BMC medical informatics and decision making, 2019. 19(1): p. 1–10.

60. Lucy Lawrence, S., Predicting stunting status among children under five years: The case study of Tanzania. 2021, University of Rwanda.

61. Jin, Z., et al. RFRSF: Employee turnover prediction based on random forests and survival analysis. in Web Information Systems Engineering–WISE 2020: 21st International Conference, Amsterdam, The Netherlands, October 20–24, 2020, Proceedings, Part II 21. 2020. Springer.

62. Sheridan R.P., et al., Extreme gradient boosting as a method for quantitative structure–activity relationships. Journal of chemical information and modeling, 2016. 56(12): p. 2353–2360. pmid:27958738

63. Rufo D.D., et al., Diagnosis of diabetes mellitus using gradient boosting machine (LightGBM). Diagnostics, 2021. 11(9): p. 1714. pmid:34574055

64. Chilyabanyama O.N., et al., Performance of machine learning classifiers in classifying stunting among under-five children in Zambia. Children, 2022. 9(7): p. 1082. pmid:35884066

65. Isnain A.R., Supriyanto J., and Kharisma M.P., Implementation of K-Nearest Neighbor (K-NN) Algorithm For Public Sentiment Analysis of Online Learning. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 2021. 15(2): p. 121–130.

66. Zhang, D. and D. Zhang, Bayesian classification. Fundamentals of Image Data Mining: Analysis, Features, Classification and Retrieval, 2019: p. 161–178.

67. Hossin M. and Sulaiman M.N., A review on evaluation metrics for data classification evaluations. International journal of data mining & knowledge management process, 2015. 5(2): p. 1.

68. Vujović Ž., Classification model evaluation metrics. International Journal of Advanced Computer Science and Applications, 2021. 12(6): p. 599–606.

69. Naidu, G., T. Zuva, and E.M. Sibanda. A Review of Evaluation Metrics in Machine Learning Algorithms. in Computer Science On-line Conference. 2023. Springer.

70. Xu Y. and Goodacre R., On splitting training and validation set: a comparative study of cross-validation, bootstrap and systematic sampling for estimating the generalization performance of supervised learning. Journal of analysis and testing, 2018. 2(3): p. 249–262. pmid:30842888

71. Hossain M.R. and Timmer D., Machine learning model optimization with hyper parameter tuning approach. Global Journal of Computer Science and Technology, 2021. 21(D2): p. 7–13.

72. Molnar, C., Interpretable machine learning. 2020: Lulu. com.

73. Li Q., et al., Mining association rules between stroke risk factors based on the Apriori algorithm. Technology and Health Care, 2017. 25(S1): p. 197–205. pmid:28582907

74. Council, N., Frontiers in Massive Data Analysis. he National Academies Press. Washington, DC, 2013.

75. Roberts M.E., Stewart B.M., and Tingley D., Navigating the local modes of big data. Computational social science, 2016. 51.

76. Mangalathu S., Hwang S.-H., and Jeon J.-S., Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Engineering Structures, 2020. 219: p. 110927.

77. Prendin F., et al., The importance of interpreting machine learning models for blood glucose prediction in diabetes: an analysis using SHAP. Scientific Reports, 2023. 13(1): p. 16865. pmid:37803177

78. Kashifi M.T., Investigating two-wheelers risk factors for severe crashes using an interpretable machine learning approach and SHAP analysis. IATSS research, 2023. 47(3): p. 357–371.

79. Alshankati, K., et al., The use of machine learning models to predict PFS and OS outcomes from waterfall plots in randomized clinical trials (MAP-OUTCOMES). 2023, American Society of Clinical Oncology.

80. Ju C., et al., A novel method of interestingness measures for association rules mining based on profit. Discrete Dynamics in Nature and Society, 2015. 2015.

81. Altaf W., Shahbaz M., and Guergachi A., Applications of association rule mining in health informatics: a survey. Artificial Intelligence Review, 2017. 47: p. 313–340.

82. Islam M.M., et al., Application of machine learning based algorithm for prediction of malnutrition among women in Bangladesh. International Journal of Cognitive Computing in Engineering, 2022. 3: p. 46–57.

83. Ndagijimana S., et al., Prediction of stunting among under-5 children in Rwanda using machine learning techniques. Journal of Preventive Medicine and Public Health, 2023. 56(1): p. 41. pmid:36746421

84. Kumar P., et al., Associated factors and socio-economic inequality in the prevalence of thinness and stunting among adolescent boys and girls in Uttar Pradesh and Bihar, India. PloS one, 2021. 16(2): p. e0247526. pmid:33626097

85. Assefa H., Belachew T., and Negash L., Socio-demographic factors associated with underweight and stunting among adolescents in Ethiopia. The Pan African Medical Journal, 2015. 20. pmid:26161175

86. Özgüven I., et al., Evaluation of nutritional status in Turkish adolescents as related to gender and socioeconomic status. Journal of clinical research in pediatric endocrinology, 2010. 2(3): p. 111. pmid:21274324

87. Yetubie M., et al., Socioeconomic and demographic factors affecting body mass index of adolescents students aged 10–19 in Ambo (a rural town) in Ethiopia. International journal of biomedical science: IJBS, 2010. 6(4): p. 321. pmid:23675209

88. Hiebert, L., Adolescent Health: Parity and Nutritional Status among Married Women. What is unique for adolescents and what is similar to older women? 2016, Yale University.

89. Kaplanoglu M., et al., Gynecologic age is an important risk factor for obstetric and perinatal outcomes in adolescent pregnancies. Women and Birth, 2015. 28(4): p. e119–e123. pmid:26205092

90. Bangoura, A., Determinants of Childhood Stunting in Guinea: Further Analysis of Demographic and Health Survey 2012. 2018.

91. Tamibmaniam J., et al., Proposal of a clinical decision tree algorithm using factors associated with severe dengue infection. PLoS One, 2016. 11(8): p. e0161696. pmid:27551776

92. Tanner L., et al., Decision tree algorithms predict the diagnosis and outcome of dengue fever in the early phase of illness. PLoS neglected tropical diseases, 2008. 2(3): p. e196. pmid:18335069

Word count: 12922

Show less

© 2025 Zemariam et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Background

Stunting is a vital indicator of chronic undernutrition that reveals a failure to reach linear growth. Investigating growth and nutrition status during adolescence, in addition to infancy and childhood is very crucial. However, the available studies in Ethiopia have been usually focused in early childhood and they used the traditional stastical methods. Therefore, this study aimed to employ multiple machine learning algorithms to identify the most effective model for the prediction of stunting among adolescent girls in Ethiopia.

Methods

A total of 3156 weighted samples of adolescent girls aged 15–19 years were used from the 2016 Ethiopian Demographic and Health Survey dataset. The data was pre-processed, and 80% and 20% of the observations were used for training, and testing the model, respectively. Eight machine learning algorithms were included for consideration of model building and comparison. The performance of the predictive model was evaluated using evaluation metrics value through Python software. The synthetic minority oversampling technique was used for data balancing and Boruta algorithm was used to identify best features. Association rule mining using an Apriori algorithm was employed to generate the best rule for the association between the independent feature and the targeted feature using R software.

Results

The random forest classifier (sensitivity = 81%, accuracy = 77%, precision = 75%, f1-score = 78%, AUC = 85%) outperformed in predicting stunting compared to other ML algorithms considered in this study. Region, poor wealth index, no formal education, unimproved toilet facility, rural residence, not used contraceptive method, religion, age, no media exposure, occupation, and having one or more children were the top attributes to predict stunting. Association rule mining was identified the top seven best rules that most frequently associated with stunting among adolescent girls in Ethiopia.

Conclusion

The random forest classifier outperformed in predicting and identifying the relevant predictors of stunting. Results have shown that machine learning algorithms can accurately predict stunting, making them potentially valuable as decision-support tools for the relevant stakeholders and giving emphasis for the identified predictors could be an important intervention to halt stunting among adolescent girls.

Details

Title

Prediction of stunting and its socioeconomic determinants among adolescent girls in Ethiopia using machine learning algorithms

Author

Alemu Birara Zemariam

; Abate, Biruk Beletew; Addis, Wondmagegn Alamaw; Lake, Eyob shitie; Gizachew Yilak; Ayele, Mulat; Befkad Derese Tilahun; Ngusie, Habtamu Setegn

First page

e0316452

Section

Research Article

Publication year

2025

Publication date

Jan 2025

Publisher

Public Library of Science

e-ISSN

19326203

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1371/journal.pone.0316452

ProQuest document ID

3159629679

Prediction of stunting and its socioeconomic determinants among adolescent girls in Ethiopia using machine learning algorithms

Jump to:

Full text

Introduction

Methods

Design, data source, setting, and periods

Population of the study

Study variables and measurements

Data preprocessing and analytic strategies

Data cleaning

Imbalanced data handling

Feature engineering

Model selection

Model training and evaluation

Association rule mining

Model interpretability

Ethical considerations

Results

Study participant characteristics

Machine learning analysis of stunting among adolescent girls

Data balancing.

Features selection using Boruta algorithms.

Model development and performance evaluation to predict stunting.

Model interpretability

SHAP value interpretation.

Association rule mining.

Discussion

Strength and limitations of the study

Conclusion and implication of the study

Supporting information

Acknowledgments

References

Abstract

Details

Suggested sources