Document Preview Unavailable

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Cheng, Zesen; Leng, Sicong; Zhang, Hang; Xin, Yifei; Li, Xin; et al.  arXiv.org, Oct 30, 2024.

You might have access to this document