Abstract: Hallucination phenomena in vision-language models significantly hinder their application potential in downstream tasks requiring high-precision reasoning, posing a critical bottleneck to the development of trustworthy artificial intelligence. Current approaches primarily rely on specific training paradigms or heuristic decoding strategies, which, while partially alleviating hallucinations, are constrained by two major limitations: substantial computational overhead from data construction and model fine-tuning, and the lack of fine-grained attribution capabilities for understanding the mechanisms behind hallucination generation.
In this study, we propose a causal intervention-based generation tracing framework, CARE (Causal-Aware Robust Estimation), which achieves token-level hallucination suppression without requiring additional training. Our key insight is that excessive reliance on linguistic priors serves as the core mechanism driving hallucinations, and its statistical characteristics can be precisely captured through dual-pathway contrast. The CARE framework innovatively constructs factual and counterfactual dual generation pathways, employing robust interquartile effect detection to quantify the visual dependency of generated tokens.
Experiments on multimodal evaluation benchmarks including HallusionBench, MMHalBench, and POPE demonstrate that CARE reduces hallucination rates by an average of 9\% in LLaVA-1.5 models while improving answer accuracy by 11\%, all while maintaining the fluency of generation outputs. This research provides a novel, interpretable paradigm for understanding and mitigating hallucinations in multimodal systems.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: Ethics, Bias, and Fairness,Multimodality and Language Grounding to Vision, Robotics and Beyond
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 6030
Loading