Content area

Abstract

Video temporal grounding (VTG) aims to locate specific temporal segments from an untrimmed video based on a linguistic query. Most existing VTG models are trained on extensive annotated video-text pairs, a process that not only introduces human biases from the queries but also incurs significant computational costs. To tackle these challenges, we propose VTG-GPT, a GPT-based method for zero-shot VTG without training or fine-tuning. To reduce prejudice in the original query, we employ Baichuan2 to generate debiased queries. To lessen redundant information in videos, we apply MiniGPT-v2 to transform visual content into more precise captions. Finally, we devise the proposal generator and post-processing to produce accurate segments from debiased queries and image captions. Extensive experiments demonstrate that VTG-GPT significantly outperforms SOTA methods in zero-shot settings and surpasses unsupervised approaches. More notably, it achieves competitive performance comparable to supervised methods. The code is available on GitHub.

Details

1009240
Business indexing term
Title
VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT
Author
Xu, Yifang 1   VIAFID ORCID Logo  ; Sun, Yunzhuo 2 ; Xie, Zien 1 ; Zhai, Benxiang 1 ; Du, Sidan 1 

 School of Electronic Science and Engineering, Nanjing University, Nanjing 210093, China; [email protected] (Y.X.); [email protected] (Z.X.); [email protected] (B.Z.) 
 School of Physics and Electronics, Hubei Normal University, Huangshi 435002, China; [email protected] 
Publication title
Volume
14
Issue
5
First page
1894
Publication year
2024
Publication date
2024
Publisher
MDPI AG
Place of publication
Basel
Country of publication
Switzerland
Publication subject
e-ISSN
20763417
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2024-02-25
Milestone dates
2024-01-18 (Received); 2024-02-17 (Accepted)
Publication history
 
 
   First posting date
25 Feb 2024
ProQuest document ID
2955469495
Document URL
https://www.proquest.com/scholarly-journals/vtg-gpt-tuning-free-zero-shot-video-temporal/docview/2955469495/se-2?accountid=208611
Copyright
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2024-08-26
Database
ProQuest One Academic