Content area

Abstract

Background

Himalayan forests are fragile, rich in biodiversity, and face increasing threats from anthropogenic pressures and climate change. Assessing their health is critical for sustainable forest management. This study integrated ecological indicators (tree density, size, regeneration, deforestation, slope, grazing, and erosion) with machine learning (ML) to classify forest health and identify key drivers across 37 Western Himalayan sites. Principal component analysis (PCA) reduced data dimensionality, highlighting major ecological gradients. K-means clustering was used to group forests into three distinct classes based on ecological characteristics, due to its efficiency in identifying natural patterns within multivariate data. ML models, including Decision Tree (DT), Random Forest (RF), and Support Vector Machine (SVM) were trained and validated using an 80:20 train-test split and 5-fold cross-validation.

Results

PCA revealed that elevation, disturbance, and regeneration explained 74.3% variance. Forest health varied across sites, with 10 categorized as healthy, 19 as moderate, and 8 as unhealthy. Forest regeneration was highly skewed (2.67) and leptokurtic (9.8), with few sites showing high seedling abundance, while deforestation (mean = 294 stumps ha−1) indicated uneven human impact. Among ML models, RF showed the best performance with a mean accuracy of 0.83, Kappa 0.87, and balanced accuracy 0.88. SVM followed with 0.75 accuracy, Kappa 0.70, and balanced accuracy 0.81. DT performed lowest with 0.66 accuracy and Kappa 0.45. Cross-validation confirmed RF’s highest mean accuracy (90.3%), followed by SVM (88.1%) and DT (65.1%). RF-based feature importance analysis showed tree DBH, height, regeneration rate, soil erosion, and tree density as key ecological drivers of forest health.

Conclusions

This study highlights ML-driven classification as a precise, scalable, and objective tool for large-scale forest health assessments. Conservation efforts should prioritize degraded forests through afforestation, slope stabilization, controlled grazing, erosion management, and continuous ecosystem monitoring. Future studies should integrate high-resolution remote sensing (e.g., Landsat, Sentinel-2) and climate datasets (e.g., temperature, precipitation, and drought indices) to enhance predictive capabilities and support long-term forest management planning. The findings underscore the value of data-driven approaches, establishing machine learning as an effective tool to enhance forest monitoring and support evidence-based forest conservation and management in the Himalayas.

Details

1009240
Business indexing term
Title
A data-driven approach to forest health assessment through multivariate analysis and machine learning techniques
Publication title
Volume
25
Pages
1-16
Number of pages
17
Publication year
2025
Publication date
2025
Section
Research
Publisher
Springer Nature B.V.
Place of publication
London
Country of publication
Netherlands
Publication subject
e-ISSN
14712229
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-07-15
Milestone dates
2025-04-09 (Received); 2025-06-23 (Accepted); 2025-07-15 (Published)
Publication history
 
 
   First posting date
15 Jul 2025
ProQuest document ID
3237002303
Document URL
https://www.proquest.com/scholarly-journals/data-driven-approach-forest-health-assessment/docview/3237002303/se-2?accountid=208611
Copyright
© 2025. This work is licensed under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-08-06
Database
ProQuest One Academic