Content area
Background
Himalayan forests are fragile, rich in biodiversity, and face increasing threats from anthropogenic pressures and climate change. Assessing their health is critical for sustainable forest management. This study integrated ecological indicators (tree density, size, regeneration, deforestation, slope, grazing, and erosion) with machine learning (ML) to classify forest health and identify key drivers across 37 Western Himalayan sites. Principal component analysis (PCA) reduced data dimensionality, highlighting major ecological gradients. K-means clustering was used to group forests into three distinct classes based on ecological characteristics, due to its efficiency in identifying natural patterns within multivariate data. ML models, including Decision Tree (DT), Random Forest (RF), and Support Vector Machine (SVM) were trained and validated using an 80:20 train-test split and 5-fold cross-validation.
Results
PCA revealed that elevation, disturbance, and regeneration explained 74.3% variance. Forest health varied across sites, with 10 categorized as healthy, 19 as moderate, and 8 as unhealthy. Forest regeneration was highly skewed (2.67) and leptokurtic (9.8), with few sites showing high seedling abundance, while deforestation (mean = 294 stumps ha−1) indicated uneven human impact. Among ML models, RF showed the best performance with a mean accuracy of 0.83, Kappa 0.87, and balanced accuracy 0.88. SVM followed with 0.75 accuracy, Kappa 0.70, and balanced accuracy 0.81. DT performed lowest with 0.66 accuracy and Kappa 0.45. Cross-validation confirmed RF’s highest mean accuracy (90.3%), followed by SVM (88.1%) and DT (65.1%). RF-based feature importance analysis showed tree DBH, height, regeneration rate, soil erosion, and tree density as key ecological drivers of forest health.
Conclusions
This study highlights ML-driven classification as a precise, scalable, and objective tool for large-scale forest health assessments. Conservation efforts should prioritize degraded forests through afforestation, slope stabilization, controlled grazing, erosion management, and continuous ecosystem monitoring. Future studies should integrate high-resolution remote sensing (e.g., Landsat, Sentinel-2) and climate datasets (e.g., temperature, precipitation, and drought indices) to enhance predictive capabilities and support long-term forest management planning. The findings underscore the value of data-driven approaches, establishing machine learning as an effective tool to enhance forest monitoring and support evidence-based forest conservation and management in the Himalayas.
Details
Soil erosion;
Landsat;
Grazing;
Principal components analysis;
Erosion control;
Multivariate analysis;
Machine learning;
Management planning;
Erosion rates;
Land degradation;
Human influences;
Density;
Human impact;
Decision trees;
Data reduction;
Remote sensing;
Clustering;
Classification;
Drought index;
Seedlings;
Slope stability;
Climate change;
Software;
Sustainability management;
Timber;
Biodiversity;
Forest conservation;
Deforestation;
Slope stabilization;
Monitoring;
Drought;
Learning algorithms;
Regeneration;
Sustainable forestry;
Cluster analysis;
Wind;
Support vector machines;
Forest degradation;
Variables;
Vector quantization