HiFACTMix: A Code-Mixed Benchmark and Graph-Aware Model for EvidenceBased Political Claim Verification in Hinglish

ICLR 2026 Conference Submission25174 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Hinglish, Fact-checking, Code-mixed languages, Low-resource NLP, Political discourse, Quantum-enhanced RAG, Evidence graph reasoning, LLM explanations
TL;DR: We introduce HiFACT, a Hinglish political fact-checking benchmark and propose a quantum-enhanced RAG framework that improves accuracy and explanation quality in low-resource, code-mixed settings
Abstract: Fact-checking in code-mixed, low-resource languages such as Hinglish remains a significant and underexplored challenge in natural language processing. Existing fact-verification systems are primarily designed for high-resource, monolingual settings and fail to generalize to real-world political discourse in linguistically diverse regions like India. To address this gap, we introduce HiFACTMix, a novel benchmark comprising approximately 1,500 real-world factual claims made by 28 Indian state Chief Ministers and several influential political leaders in Hinglish,each annotated with textual evidence and veracity labels (True, False, Partially True, Unverifiable). Building on this resource, we propose a Quantum-Enhanced Retrieval-Augmented Generation (RAG) framework that integrates code-mixed text encoding, evidence graph reasoning, and explanation generation. Experimental results show that HiFACTMix not only outperforms strong multilingual and code-mixed baselines (CM-BERT, VerT5erini, IndicBERT, mBERT) but also remains competitive against recent large language models, including GPT-4, LLaMA-2, and Mistral. Unlike generic LLMs that may generate fluent but weakly grounded outputs, HiFACTMix explanations are explicitly linked to retrieved evidence, ensuring both accuracy and transparency. This work opens a new direction for multilingual, quantum-assisted, and politically grounded fact verification, with implications for combating misinformation in low-resource, code-mixed environments.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 25174
Loading