Adversarial Co-Evolution of LLM-Generated Policies and Environments via Two-Player Zero-Sum Game

Adversarial Co-Evolution of LLM-Generated Policies and Environments via Two-Player Zero-Sum Game

ICLR 2026 Conference Submission18453 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Code-as-Policy, Large Language Models, Unsupervised Environment Design, Policy–Environment Co-evolution, Zero-Sum Games

TL;DR: We propose a co-evolutionary framework using LLMs to jointly generate agents and environments as executable code, producing robust, self-improving policies via a curriculum shaped by a game-theoretic interaction.

Abstract: A central challenge towards building agents that continually improve is that training environments are typically fixed or manually designed. This restricts continual learning and generalization beyond the training distribution. We address this with Covolve, a co-evolutionary framework that leverages large language models (LLMs) to generate both environments and agent policies, expressed as executable Python code. We model the interaction between environment and policy designers as a two-player zero-sum game, ensuring adversarial co-evolution in which environments expose policy weaknesses and policies adapt in response. To guarantee robustness and prevent forgetting, we compute the mixed strategy Nash equilibrium (MSNE) of this game, yielding a meta-policy that remains robust across all generated environments rather than overfitting to the most recent one. This process induces an automated curriculum in which environments and policies co-evolve toward increasing complexity. Experiments in urban driving, maze-solving, and 2D navigation showcase that Covolve produces progressively more complex environments. The MSNE meta-policy also ensures that the agent does not forget to solve previously seen environments, all the while learning to solve unseen ones. These results demonstrate the potential of LLM-driven co-evolution to achieve open-ended learning without predefined task distributions or manual intervention.

Supplementary Material: zip

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 18453

Loading