Keywords: Multimodal LLM, Hallucination
Abstract: Multimodal large language models (MLLMs) enable vision-language reasoning, yet often generate plausible outputs that are factually incorrect or visually ungrounded, thereby compromising their reliability. Direct preference optimization (DPO) is a common strategy for correcting hallucinations by aligning model outputs with human preferences. However, existing DPO strategies typically treat hallucination-related preferences as fixed targets, relying on static and potentially biased supervision signals during training. This approach tends to overfit to superficial linguistic cues in preference data, leading to distributional rigidity and spurious correlations that impair grounding in causally relevant visual information. To overcome this limitation, we propose TARS, a token-adaptive preference strategy that reformulates DPO as a min–max optimization problem. TARS maximizes token-level distributional shifts under explicit semantic constraints to simulate alignment uncertainty, and simultaneously minimizes the expected preference loss under these controlled perturbations. This joint objective effectively preserves causal grounding while mitigating overfitting to preference patterns, thereby reducing hallucinations in multimodal reasoning. We evaluate TARS on multiple hallucination benchmarks and find consistently robust performance. Using only 4.8k preference samples and no expert feedback, TARS reduces hallucination rates from 26.4\% to 13.2\% and decreases cognition value from 2.5 to 0.4, outperforming standard DPO and matching GPT-4o on several key metrics.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 430
Loading