HSCL-RL: Mitigating Hallucinations in Multimodal Large Language Models

Zichen Song; Sitan Huang

HSCL-RL: Mitigating Hallucinations in Multimodal Large Language Models

Zichen Song, Sitan Huang

Published: 22 Oct 2024, Last Modified: 22 Oct 2024NeurIPS 2024 Workshop Open-World Agents PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal Large Language Model, Hallucination, Reinforcement Learning, Contrastive Learning, Open World

TL;DR: Mitigating Hallucinations in Multimodal Large Language Models

Abstract: Multimodal large language models (MLLMs) have shown excellent performance in tasks that combine natural language and visual information. However, they still suffer from hallucinations, where they generate incorrect or false information, especially in open-world environments. This study proposes a method that combines reinforcement learning and contrastive learning to alleviate the hallucination problem in MLLMs. By introducing Hallucination-Augmented Contrastive Learning (HSCL), we utilize false text as hard negative samples to strengthen the alignment between visual and textual representations. Additionally, within a reinforcement learning framework, we dynamically adjust the model in open-world environments to further reduce hallucinations. Experimental results demonstrate that the proposed method effectively reduces hallucination rates across multiple benchmark datasets and significantly improves overall model performance.

Submission Number: 2

Loading