SAGE: Self-play Adversarial Games Enhance Large Language Model Reasoning Capabilities

Published: 05 Mar 2026, Last Modified: 13 Mar 2026ICLR 2026 Workshop RSI PosterEveryoneRevisionsCC BY 4.0
Keywords: Large Language Models, Reinforcement Learning, Reasoning, Self-Play
Abstract: We introduce **SAGE (Self-play Adversarial Games for Enhancement)**, a framework for improving LLM reasoning capabilities through adversarial self-play without human-curated data. SAGE places two model instances in an asymmetric game: a Setter generates a problem and predicts its solution, while a Solver attempts to solve the problem independently. The Setter receives positive reward only when it answers correctly and the Opponent fails, incentivizing the generation of problems that are solvable yet challenging and naturally targeting the frontier of model capabilities. We instantiate SAGE in two domains: **Code-Game**, where problems are Python programs verified by execution, and **Math-Game**, where math problems are graded by an external LLM judge, as a proxy for a verifiable environment. Training models from 1B to 4B parameters across two architectures (Qwen, Llama), SAGE consistently outperforms baselines: up to +10% on MATH, +8% on MBPP, and +6% on ARC-Challenge. Notably, we find cross-domain transfer: Code-Game training improves mathematical reasoning and vice versa, suggesting SAGE strengthens domain-general reasoning skills. Ablations confirm that adversarial pressure, rather than verified rewards alone, drives these gains: removing the opponent while retaining execution-verified rewards decreases the improvement by 40-70%. SAGE offers a scalable path to reasoning improvement that requires only a verifier, not human supervision.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 94
Loading