Poly-FEVER: A Multilingual Fact Verification Benchmark for Hallucination Detection in Large Language Models

ICLR 2026 Conference Submission6104 Authors

15 Sept 2025 (modified: 28 Nov 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Benchmark, AI Hallucination, Large Language Models, Multilingual Bias
Abstract: We present Poly-FEVER, a large-scale multilingual benchmark for fact verification and hallucination detection in large language models (LLMs). Poly-FEVER extends FEVER, Climate-FEVER, and SciFact to 77,973 labeled claims across 11 languages—English, Chinese, Hindi, Arabic, Bengali, Japanese, Korean, Tamil, Thai, Georgian, and Amharic—curated to preserve logical equivalence across scripts and morphologies and to focus on verifiable Supported/Refuted cases. We augment each claim with topic metadata derived from a 22-topic LDA model to enable topic-aware evaluation. Claims are translated using Google Cloud Translation and validated with GEMBA scores averaging~90 across languages, supporting high semantic fidelity. Using Poly-FEVER, we benchmark ChatGPT-3.5, LLaMA‑2 (7B/13B/70B), and LLaMA‑3.1‑8B under language-wise, general, and classification prompt families, and study self-detection via rephrasing. We further probe resource imbalance by correlating accuracy with automated web presence (Google hit counts) and test retrieval-augmented generation via DPR over Wikipedia. Results show pronounced cross-lingual disparities: high-resource languages (e.g., English, Chinese) achieve the strongest accuracy, while lower-resource languages (e.g., Amharic, Tamil) lag; accuracy correlates with web presence. Topic structuring consistently benefits lower-resource settings, and RAG provides selective gains (notably in Arabic and Amharic), but can conflict with strong internal priors in high-resource languages. Poly-FEVER establishes a rigorous, publicly available foundation for responsible, language-adaptive evaluation of hallucination mitigation in LLMs.
Primary Area: datasets and benchmarks
Submission Number: 6104
Loading