INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages

Abhishek Kumar Singh, Vishwajeet Kumar, Rudra Murthy, Jaydeep Sen, Ashish R. Mittal, Ganesh Ramakrishnan

Published: 2025, Last Modified: 13 Jan 2026NAACL (Findings) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Large Language Models (LLMs) perform well on unseen tasks in English, but their abilities in non-English languages are less explored due to limited benchmarks and training data. To bridge this gap, we introduce the Indic-QA Benchmark, a large dataset for context-grounded question answering in 11 major Indian languages, covering both extractive and abstractive tasks. Evaluations of multilingual LLMs, including instruction fine-tuned versions, revealed weak performance in low-resource languages due to a strong English-language bias in their training data. We also investigated the Translate-Test paradigm,where inputs are translated to English for processing and the results are translated back into the source language for output. This approach outperformed multilingual LLMs, particularly in low-resource settings. By releasing Indic-QA, we aim to promote further research into LLMs’ question-answering capabilities in low-resource languages. This benchmark offers a critical resource to address existing limitations and foster multilingual understanding.