Full text

Turn on search term navigation

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Forest health monitoring at scale requires high-spatial-resolution remote sensing images coupled with deep learning image analysis methods. However, high-quality large-scale datasets are costly to acquire. To address this challenge, we explored the potential of freely available National Agricultural Imagery Program (NAIP) imagery. By comparing the performance of traditional convolutional neural network (CNN) models (U-Net and DeepLabv3+) with a state-of-the-art Vision Transformer (SegFormer), we aimed to determine the optimal approach for detecting unhealthy tree crowns (UTC) using a publicly available data source. Additionally, we investigated the impact of different spectral band combinations on model performance to identify the most effective configuration without incurring additional data acquisition costs. We explored various band combinations, including RGB, color infrared (CIR), vegetation indices (VIs), principal components (PC) of texture features (PCA), and spectral band with PC (RGBPC). Furthermore, we analyzed the uncertainty associated with potential subjective crown annotation and its impact on model evaluation. Our results demonstrated that the Vision Transformer-based model, SegFormer, outperforms traditional CNN-based models, particularly when trained on RGB images yielding an F1-score of 0.85. In contrast, DeepLabv3+ achieved F1-score of 0.82. Notably, PCA-based inputs yield reduced performance across all models, with U-Net producing particularly poor results (F1-score as low as 0.03). The uncertainty analysis indicated that the Intersection over Union (IoU) could fluctuate between 14.81% and 57.41%, while F1-scores ranged from 8.57% to 47.14%, reflecting the significant sensitivity of model performance to inconsistencies in ground truth annotations. In summary, this study demonstrates the feasibility of using publicly available NAIP imagery and advanced deep learning techniques to accurately detect unhealthy tree canopies. These findings highlight SegFormer’s superior ability to capture complex spatial patterns, even in relatively low-resolution (60 cm) datasets. Our findings underline the considerable influence of human annotation errors on model performance, emphasizing the need for standardized annotation guidelines and quality control measures.

Details

Title
Vision Transformer-Based Unhealthy Tree Crown Detection in Mixed Northeastern US Forests and Evaluation of Annotation Uncertainty
Author
Joshi, Durga  VIAFID ORCID Logo  ; Witharana, Chandi
First page
1066
Publication year
2025
Publication date
2025
Publisher
MDPI AG
e-ISSN
20724292
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3182137532
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.