Abstract

Scanning electron microscopy images, with their high potential to reveal detailed microstructural and compositional information across various fields, are challenging to label and process due to the large volumes being generated, the presence of noise and artifacts, and the reliance on domain expertise. Moreover, the lack of scalable, automated, and interpretable methods for analyzing scanning electron microscopy images has prompted this research, which focuses on three primary objectives. First, the use of semi-supervised learning techniques, including pseudo-labeling and consistency regularization, aims to utilize both labeled and unlabeled scanning electron microscopy data by generating pseudo-labels for the unlabeled data and enforcing consistency in predictions for perturbed inputs. Second, this study introduces a hybrid Vision Transformer (ViT-ResNet50) model, which combines the representational power of ViT with the feature extraction capabilities of ResNet50. Lastly, the use of SHapley Additive exPlanations enhances the model’s interpretability, revealing critical image regions that contribute to predictions. To evaluate performance, the model is assessed using confusion matrices, test accuracy, precision, recall, F1 scores, receiver operating characteristic—area under the curve scores, model fit duration, and trainable parameters, along with a comparative analysis to demonstrate its competitiveness against state-of-the-art models in both semi-supervised and supervised (completely labeled data) settings. As a result, the semi-supervised based ViT-ResNet50 model achieved accuracies of 93.65% and 84.76% on the scanning electron microscopy Aversa and UltraHigh Carbon Steel Database, respectively, with notable interpretability, surpassing baseline models like ResNet101, InceptionV3, InceptionResNetV2, and InceptionV4. The findings highlight the potential of semi-supervised to improve model performance in scenarios with limited labeled data, though challenges such as class imbalance and increased computational cost suggest areas for further optimization.

Details

Title
Hybrid vision transformer framework for efficient and explainable SEM image-based nanomaterial classification
Author
Kaur, Manpreet  VIAFID ORCID Logo  ; Valderrama, Camilo E  VIAFID ORCID Logo  ; Liu, Qian 1   VIAFID ORCID Logo 

 Department of Applied Computer Science and Society, The University of Winnipeg , Winnipeg, Canada 
First page
015066
Publication year
2025
Publication date
Mar 2025
Publisher
IOP Publishing
e-ISSN
26322153
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3179842722
Copyright
© 2025 The Author(s). Published by IOP Publishing Ltd. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.