Content area

Abstract

Large Vision-Language Models (LVLMs) have advanced considerably, intertwining visual recognition and language understanding to generate content that is not only coherent but also contextually attuned. Despite their success, LVLMs still suffer from the issue of object hallucinations, where models generate plausible yet incorrect outputs that include objects that do not exist in the images. To mitigate this issue, we introduce Visual Contrastive Decoding (VCD), a simple and training-free method that contrasts output distributions derived from original and distorted visual inputs. The proposed VCD effectively reduces the over-reliance on statistical bias and unimodal priors, two essential causes of object hallucinations. This adjustment ensures the generated content is closely grounded to visual inputs, resulting in contextually accurate outputs. Our experiments show that VCD, without either additional training or the usage of external tools, significantly mitigates the object hallucination issue across different LVLM families. Beyond mitigating object hallucinations, VCD also excels in general LVLM benchmarks, highlighting its wide-ranging applicability.

Details

1009240
Title
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Publication title
arXiv.org; Ithaca
Publication year
2023
Publication date
Nov 28, 2023
Section
Computer Science
Publisher
Cornell University Library, arXiv.org
Source
arXiv.org
Place of publication
Ithaca
Country of publication
United States
University/institution
Cornell University Library arXiv.org
e-ISSN
2331-8422
Source type
Working Paper
Language of publication
English
Document type
Working Paper
Publication history
 
 
Online publication date
2023-11-29
Milestone dates
2023-11-28 (Submission v1)
Publication history
 
 
   First posting date
29 Nov 2023
ProQuest document ID
2895042811
Document URL
https://www.proquest.com/working-papers/mitigating-object-hallucinations-large-vision/docview/2895042811/se-2?accountid=208611
Full text outside of ProQuest
Copyright
© 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2023-11-30
Database
2 databases
  • ProQuest One Academic
  • ProQuest One Academic