Bi-DCA: Bi-directional Dual Contrastive Adapting for Alleviating Hallucination in Multimodal Large Language Models

ACL ARR 2024 June Submission668 Authors

12 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Multimodal Large Language Models (MLLMs) demonstrate excellent performance across various multimodal tasks. However, they still tend to generate text with hallucinations in certain scenarios. Previous efforts to alleviate hallucinations approach this issue from fine-tuning, dataset, and inference perspectives. Despite these efforts, there are two existing challenges in MLLMs particularly the confusing image objects and generating persistent hallucinations. In this paper, we propose a novel training-free method called Bi-directional Dual Contrastive Adapting (Bi-DCA) to alleviate the hallucinations in MLLMs that can integrate seamlessly into the existing decoding methods. We first design a bi-directional attention mechanism to expand the visual receptive field to address the problem of confusing image objects. Building on this, to alleviate the persistent hallucinations in generated sentences, we propose a dual contrastive adapting strategy to enhance the positive effect of images during the next token prediction stage. We conduct extensive experiments using various evaluation methods and benchmarks for hallucination. The experimental results demonstrate that our Bi-DCA not only alleviates the above challenges but achieves superior performance compared with previous methods.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: Multimodal Large Language Model, Hallucination, training-free
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 668
Loading