Full Text

Turn on search term navigation

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Hand detection and classification is a very important pre-processing step in building applications based on three-dimensional (3D) hand pose estimation and hand activity recognition. To automatically limit the hand data area on egocentric vision (EV) datasets, especially to see the development and performance of the “You Only Live Once” (YOLO) network over the past seven years, we propose a study comparing the efficiency of hand detection and classification based on the YOLO-family networks. This study is based on the following problems: (1) systematizing all architectures, advantages, and disadvantages of YOLO-family networks from version (v)1 to v7; (2) preparing ground-truth data for pre-trained models and evaluation models of hand detection and classification on EV datasets (FPHAB, HOI4D, RehabHand); (3) fine-tuning the hand detection and classification model based on the YOLO-family networks, hand detection, and classification evaluation on the EV datasets. Hand detection and classification results on the YOLOv7 network and its variations were the best across all three datasets. The results of the YOLOv7-w6 network are as follows: FPHAB is P = 97% with TheshIOU = 0.5; HOI4D is P = 95% with TheshIOU = 0.5; RehabHand is larger than 95% with TheshIOU = 0.5; the processing speed of YOLOv7-w6 is 60 fps with a resolution of 1280 × 1280 pixels and that of YOLOv7 is 133 fps with a resolution of 640 × 640 pixels.

Details

Title
YOLO Series for Human Hand Action Detection and Classification from Egocentric Videos
Author
Nguyen, Hung-Cuong 1   VIAFID ORCID Logo  ; Thi-Hao, Nguyen 1 ; Scherer, Rafał 2   VIAFID ORCID Logo  ; Van-Hung, Le 3 

 Faculty of Engineering Technology, Hung Vuong University, Viet Tri City 35100, Vietnam 
 Department of Intelligent Computer Systems, Czestochowa University of Technology, 42-218 Czestochowa, Poland 
 Faculty of Basic Science, Tan Trao University, Tuyen Quang City 22000, Vietnam 
First page
3255
Publication year
2023
Publication date
2023
Publisher
MDPI AG
e-ISSN
14248220
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2791700370
Copyright
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.