From Fragments to Facts: A Curriculum-Driven DPO Approach for Generating Hindi News Veracity Explanations

TMLR Paper5065 Authors

09 Jun 2025 (modified: 13 Jun 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In an era of rampant misinformation, generating reliable news explanations is vital, especially for under-represented languages like Hindi. Lacking robust automated tools, Hindi faces challenges in scaling misinformation detection. To bridge this gap, we propose a novel framework integrating Direct Preference Optimization (DPO) with curriculum learning to align machine-generated explanations with human reasoning. Fact-checked explanations from credible sources serve as preferred responses, while LLM outputs highlight system limitations and serve as non-preferred responses. To refine task-specific alignment, we introduce two key parameters—\textit{Actuality} and \textit{Finesse}—into the DPO loss function, enhancing explanation quality and consistency. Experiments with LLMs (Mistral, Llama, Gemma) and PLMs (mBART, mT5) confirm the framework's effectiveness in generating coherent, contextually relevant explanations. This scalable approach combats misinformation and extends automated explanation generation to low-resource languages.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=eNmqqcD0Fr
Changes Since Last Submission: Our manuscript was previously desk rejected due to unintended changes to the default font. We have carefully reviewed and identified the packages responsible for this issue and have removed them to ensure full compliance with the formatting guidelines. We sincerely thank the editors for the opportunity to resubmit.
Assigned Action Editor: ~Greg_Durrett1
Submission Number: 5065
Loading