Full Text

Turn on search term navigation

© 2021. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Estimating the positions of human joints from monocular single RGB images has been a challenging task in recent years. Despite great progress in human pose estimation with convolutional neural networks (CNNs), a central problem still exists: the relationships and constraints, such as symmetric relations of human structures, are not well exploited in previous CNN-based methods. Considering the effectiveness of combining local and nonlocal consistencies, we propose an end-to-end self-attention network (SAN) to alleviate this issue. In SANs, attention-driven and long-range dependency modeling are adopted between joints to compensate for local content and mine details from all feature locations. To enable an SAN for both 2D and 3D pose estimations, we also design a compatible, effective and general joint learning framework to mix up the usage of different dimension data. We evaluate the proposed network on challenging benchmark datasets. The experimental results show that our method has significantly achieved competitive results on Human3.6M, MPII and COCO datasets.

Details

Title
Self-Attention Network for Human Pose Estimation
First page
1826
Publication year
2021
Publication date
2021
Publisher
MDPI AG
e-ISSN
20763417
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2492460123
Copyright
© 2021. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.