Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering

ACL ARR 2026 January Submission582 Authors

23 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: multilingual, language specific, hallucination, large langugage model
Abstract: We introduce KoLasSimpleQA, the first benchmark evaluating the multilingual factual ability of Large Language Models (LLMs). Inspired by existing research, we created the question set with features such as single knowledge point coverage, absolute objectivity, unique answers, and temporal stability. These questions enable efficient evaluation using the LLM-as-judge paradigm, testing both the LLMs' factual memory and self-awareness ("know what they don't know''). KoLasSimpleQA expands existing research in two key dimensions: (1) Breadth (Multilingual Coverage): It includes 9 languages, supporting global applicability evaluation. (2) Depth (Dual Domain Design): It covers both the general domain (global facts) and the language-specific domain (such as history, culture, and regional traditions) for a comprehensive assessment of multilingual capabilities. We evaluated mainstream LLMs, including traditional LLM and emerging Large Reasoning Models. Results show significant performance differences between the two domains, particularly in performance metrics, ranking, calibration, and robustness. This highlights the need for targeted evaluation and optimization in multilingual contexts. We hope KoLasSimpleQA will help the research community better identify LLM capability boundaries in multilingual contexts and provide guidance for model optimization. We will release KoLasSimpleQA on GitHub.
Paper Type: Long
Research Area: Multilinguality and Language Diversity
Research Area Keywords: multilingual benchmarks, multilingual evaluation, multilingual QA, knowledge base QA, multilingual corpora
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: Hungarian, Czech, Serbian, Russian, Chinese, Korean, Thai, Arabic, Vietnamese
Submission Number: 582
Loading