Keywords: Multimodal Large Language Model, Hallucination, Reinforcement Learning, Contrastive Learning, Open World
TL;DR: Mitigating Hallucinations in Multimodal Large Language Models
Abstract: Multimodal large language models (MLLMs) have shown excellent performance in tasks that combine natural language and visual information. However, they still suffer from hallucinations, where they generate incorrect or false information, especially in open-world environments. This study proposes a method that combines reinforcement learning and contrastive learning to alleviate the hallucination problem in MLLMs. By introducing Hallucination-Augmented Contrastive Learning (HSCL), we utilize false text as hard negative samples to strengthen the alignment between visual and textual representations. Additionally, within a reinforcement learning framework, we dynamically adjust the model in open-world environments to further reduce hallucinations. Experimental results demonstrate that the proposed method effectively reduces hallucination rates across multiple benchmark datasets and significantly improves overall model performance.
Submission Number: 2
Loading