Dialectic Argumentations for Oversight Reasoning

ACL ARR 2026 January Submission2277 Authors

02 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Debate for better oversight, Reasoning
Abstract: Debate has emerged as a promising Large Language Models (LLMs) oversight mechanism amid rising systematic complexity and constrained scalability in evaluation, notably where models outperform human evaluators. Yet Debate provides little verifiable evidence for its final judgments, and its scalability beyond English remains largely unexplored. To make oversight grounded and scale as capabilities extend, we propose a Dialectic Argumentation framework as a reasoning function to extend the Debate paradigm to multilingual and multimodal settings. We employ a weak-to-strong oversight approach based on two expert models that evaluate and defend contesting answers, while a third blind judge determines the winner using Dialectic Argumentation. Experts argue only for belief-consistent answers, founding the Debate on disagreements. We experimented with six tasks on our framework in both multilingual and multimodal scenarios, and dialectic argumentation consistently outperforms single-expert baselines. Moreover, we show that dialectic judgements from a weaker model deliver argument-mediated supervision that, via fine-tuning, instils unsupervised reasoning signals in expert models.
Paper Type: Long
Research Area: Low-resource Methods for NLP
Research Area Keywords: Debate for better oversight, Reasoning in Large and Small LMs
Contribution Types: Model analysis & interpretability, Reproduction study, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English, French, Chinese, Spanish, Italian, Hindi, Arabic, Finnish
Submission Number: 2277
Loading