Content area

Abstract

Vision-language models with large-scale image-text pairs have shown significant potential on representation learning. Human pose estimation task, which is highly sensitive to pixel-wise transformation, requires effective methods for mining pose-specific knowledge. In this paper, we investigate the homologous human pose retrieval task relying on large-scale annotated datasets to enhance pose knowledge extraction. We propose Pose Prompt (PosePro), which leverages vision-language models to categorize global pose configuration of an image, build compatible design, generate pose embedding as proposals. We then aim to integrate the learned knowledge as visual and textual prompt to facilitate the learning processing of newly unseen tasks. We demonstrate the effectiveness of fundamental PosePro model through extensive experiments on both pose retrieval and human pose estimation, showing significant improvements in accuracy and generalization ability, especially in scenarios with limited samples.

Details

1009240
Business indexing term
Title
Vision-language model guided pose knowledge mining for human pose estimation
Author
Chen, Yilei 1   VIAFID ORCID Logo  ; Xie, Xuemei 2   VIAFID ORCID Logo  ; Fu, Li 1 

 School of Artificial Intelligence, Xidian University , Xi’an, Shaanxi 710071 , PR China 
 Guangzhou Institute of Technology, Xidian University , Guangzhou, Guangdong 510555 , PR China 
Volume
12
Issue
9
Pages
32-45
Publication year
2025
Publication date
Sep 2025
Publisher
Oxford University Press
Place of publication
Oxford
Country of publication
United Kingdom
ISSN
22885048
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-08-05
Milestone dates
2025-04-07 (Received); 2025-07-21 (Accepted); 2025-07-16 (Rev-recd); 2025-09-03 (Corrected)
Publication history
 
 
   First posting date
05 Aug 2025
ProQuest document ID
3246078152
Document URL
https://www.proquest.com/scholarly-journals/vision-language-model-guided-pose-knowledge/docview/3246078152/se-2?accountid=208611
Copyright
© The Author(s) 2025. Published by Oxford University Press on behalf of the Society for Computational Design and Engineering. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-10-07
Database
2 databases
  • ProQuest One Academic
  • ProQuest One Academic