Full Text

Turn on search term navigation

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Artificial intelligence as an approach to visual inspection in industrial applications has been considered for decades. Recent successes, driven by advances in deep learning, present a potential paradigm shift and have the potential to facilitate an automated visual inspection, even under complex environmental conditions. Thereby, convolutional neural networks (CNN) have been the de facto standard in deep-learning-based computer vision (CV) for the last 10 years. Recently, attention-based vision transformer architectures emerged and surpassed the performance of CNNs on benchmark datasets, regarding regular CV tasks, such as image classification, object detection, or segmentation. Nevertheless, despite their outstanding results, the application of vision transformers to real world visual inspection is sparse. We suspect that this is likely due to the assumption that they require enormous amounts of data to be effective. In this study, we evaluate this assumption. For this, we perform a systematic comparison of seven widely-used state-of-the-art CNN and transformer based architectures trained in three different use cases in the domain of visual damage assessment for railway freight car maintenance. We show that vision transformer models achieve at least equivalent performance to CNNs in industrial applications with sparse data available, and significantly surpass them in increasingly complex tasks.

Details

Title
Vision Transformer in Industrial Visual Inspection
Author
Hütten, Nils  VIAFID ORCID Logo  ; Meyes, Richard  VIAFID ORCID Logo  ; Meisen, Tobias  VIAFID ORCID Logo 
First page
11981
Publication year
2022
Publication date
2022
Publisher
MDPI AG
e-ISSN
20763417
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2748520615
Copyright
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.