FontHalu: Font-based Hallucinations in Multimodal Large Language Models

FontHalu: Font-based Hallucinations in Multimodal Large Language Models

ACL ARR 2025 May Submission5714 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Multimodal large language models (MLLMs) have achieved remarkable performance in processing and reasoning over text and images. However, they remain susceptible to hallucinations—instances where generated content deviates from input data or contradicts established knowledge. While hallucinations in MLLMs have attracted increasing attention, the specific impact of font variation—a common yet overlooked source of hallucination—has not been systematically investigated. Moreover, existing OCR benchmarks include limited font diversity and primarily focus on layout or background changes, lacking fine-grained control over font factors and neglecting long-tail fonts. To address this gap, we introduce and categorize font-induced hallucinations, and conduct comprehensive experiments to examine how fonts affect MLLMs across dimensions such as font perturbations, style shifts, font-semantic interactions, and sentiment recognition.Based on these findings, we propose FontHalu, a benchmark with diverse font types and scenario settings, specifically designed to evaluate MLLMs’ robustness in OCR, key information extraction (KIE), and sentiment analysis under font variation. We will release FontHalu and related code to support research on improving the reliability and robustness of MLLMs.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Resources and Evaluation, Multimodality and Language Grounding to Vision, Robotics and Beyond

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English, Chinese

Submission Number: 5714

Loading