Systematic Performance Degradation in Indic Vision-Language Models: Evidence from Hindi and Telugu

Systematic Performance Degradation in Indic Vision-Language Models: Evidence from Hindi and Telugu

ACL ARR 2026 January Submission8561 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal VQA, Multilingual VQA, Indic VQA Benchmark, Hindi, Telugu

Abstract: With 1.5 billion people speaking over 120 major languages, India exemplifies the challenges of multilingual AI evaluation. Current multilingual VLM benchmarks suffer from unverified auto-translations, narrow task coverage, small sample sizes, and lack of culturally grounded content. We present HinTel-AlignBench, a comprehensive evaluation framework and benchmark for Hindi and Telugu vision-language models with English-aligned samples. Our framework combines semi-automated translation with human verification to generate $\sim$4k QA pairs per language across five domains: adapted English datasets (VQAv2, RealWorldQA, CLEVR-Math) and native Indic sets (JEE for STEM, VAANI for cultural grounding). Evaluation of state-of-the-art open and closed-source VLMs reveals consistent performance regression from English to Indic languages, with average drops of 8.3 points for Hindi and 5.5 points for Telugu across four of five tasks. We identify key failure modes and establish reproducible baselines for multilingual multimodal evaluation.

Paper Type: Short

Research Area: Multilinguality and Language Diversity

Research Area Keywords: Multimodal VQA, Multilingual VQA, Indic VQA Benchmark, Hindi, Telugu

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Data resources

Languages Studied: Hindi, Telugu, English

Submission Number: 8561

Loading