LLMs Become Logical Explainers through Multi-Agent Self-Play Framework

ACL ARR 2026 January Submission10220 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Natural Language Explanation, Theory-Grounded Persona, Preference Scoring, Self-Optimization
Abstract: Natural language explanations have been widely employed to convey relations underlying task outputs; however, existing approaches rely on superficial persona formulations, with limited use of theoretically grounded personas. In this study, we propose a multi-agent self-play framework in which a single model assumes multiple explanatory personas informed by cognitive and social sciences theories to generate diverse explanation candidates for the same input. Relative preferences among these candidates are then induced through a multi-factor scoring that jointly accounts for persona logical alignment, self-critique factuality from a critic agent, and expression diversity. These preferences are conveyed to a recomposer agent to synthesize an optimal explanation, which is subsequently used for self-preference learning, enabling the model to strategically learn from its own generations. Experiments show improved logical alignment over single-persona attributes and stable DPO training, demonstrating that adaptive persona selection can be effectively realized at test time.
Paper Type: Long
Research Area: Special Theme (conference specific)
Research Area Keywords: Natural Language Explanation, Theory-Grounded Persona, Preference Scoring, Self-Optimization
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 10220
Loading