Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment

Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment

ICLR 2026 Conference Submission19413 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: language model, reasoning, self-consistency, multi-agent debate, multi-agent reinforcement learning

TL;DR: Language models learn to maintain consistent answers across diverse reasoning paths and ground arguments in peer reasoning by reinforcing their own debate consensus, driving reasoning self-improvement.

Abstract: Language Models (LMs) are inconsistent reasoners, often generating contradictory responses to identical prompts. While inference-time methods can mitigate these inconsistencies, they fail to address the core problem: LMs struggle to reliably select reasoning pathways that lead to consistent outcomes under exploratory sampling. To address this, we formalize self-consistency as an intrinsic property of well-aligned reasoning models and introduce Multi-Agent Consensus Alignment (MACA), a reinforcement learning framework that post-trains models to favor reasoning trajectories aligned with their internal consensus using majority/minority outcomes from multi-agent debate. These trajectories emerge from deliberative exchanges where agents ground reasoning in peer arguments, not just aggregation of independent attempts, creating richer consensus signals than single-round majority voting. MACA enables agents to teach themselves to be more decisive and concise, and better leverage peer insights in multi-agent settings without external supervision, driving substantial improvements across self-consistency (+27.6\% on GSM8K), single-agent reasoning (+23.7\% on MATH), sampling-based inference (+22.4\% Pass@20 on MATH), and multi-agent ensemble decision-making (+42.7\% on MathQA). These findings, coupled with strong generalization to unseen benchmarks (+16.3\% on GPQA, +11.6\% on CommonsenseQA), demonstrate robust self-alignment that more reliably unlocks latent reasoning potential of language models.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 19413

Loading