Geometry of Nash Mirror Dynamics: Adaptive $\beta$-Control for Stable and Bias-Robust Self-Improving LLM Agents
Keywords: Large Language Models, Learning in Games
Abstract: Self‑improving agents learn by playing competitive, often non-transitive language games (e.g., generator–solver, proposer–verifier) where training can oscillate or drift toward undesirable behaviours. We study this scenario through the lens of reverse‑KL regularised Nash learning, showing how the regularisation strength $\beta$ shapes both where agents converge and how they get there. We derive a continuous‑time view of Nash Mirror Descent (Nash‑MD), revealing a simple geometry: trajectories are spirals on the simplex whose damping grows with $\beta$, while $\beta$ simultaneously pulls equilibria toward the reference policy—amplifying any existing biases. We prove last‑iterate convergence to the $\beta$‑regularised Nash equilibrium, quantify its first‑order shift from the unregularised solution, and link convergence speed to the spectrum of the linearised dynamics.
Building on this geometry, we introduce two adaptive $\beta$ controllers: (i) a Hessian‑based rule that targets a desired damping–rotation ratio to accelerate without overshoot, and (ii) a bias‑based rule that caps measurable bias (e.g., output length, calibration, hallucination proxies) while retaining speed. On toy games (e.g. Rock–Paper–Scissors) and small open‑model reasoning benchmarks, our controllers deliver faster, more stable convergence with bounded bias, outperforming baselines. The result is a practical recipe: tune $\beta$ as a control knob to make self‑improving LLM agents both faster and safer.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 25046
Loading