Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse

Jingwei Chen

Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse

Jingwei Chen

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: self-referential learning, model collapse, entropy reservoir, Bregman projection, information geometry, generative AI

TL;DR: Self-training is a stochastic Bregman-projection loop whose entropy inevitably vanishes unless it is continuously mixed with a high-entropy reservoir—an insight that unifies and explains all existing anti-collapse tricks.

Abstract: Self-referential learning---training a model on data it generated itself---promises boundless scalability but chronically suffers from \emph{model collapse}: language models degenerate into repetitive text, GANs drop modes, and reinforcement-learning policies over-exploit. Although practitioners employ ad~hoc fixes such as real-data mixing, entropy bonuses, knowledge distillation, or retrieval-augmented generation, a single principle that explains both the failure mode and the success of these fixes has remained elusive. We present \textbf{Entropy-Reservoir Bregman Projection} (ERBP), an information-geometric framework that unifies these phenomena. We model the closed loop as a stochastic Bregman projection sequence in distribution space. Without external coupling, finite-sample noise forces the system to project onto an ever-shrinking empirical support, causing exponential entropy decay and eventual collapse. Introducing an \emph{Entropy Reservoir}---a high-entropy distribution mixed into each projection---injects a controllable entropy flux that provably stabilises the dynamics. Our theory yields (i) a necessary condition for collapse, (ii) a sufficient condition that guarantees a non-trivial entropy floor, and (iii) closed-form rates that depend only on sample size and the strong-convexity/Lipschitz constants of the Bregman generator. Experiments on large-language-model self-training, Soft Actor-Critic in reinforcement learning, and GAN optimisation validate our predictions and show that disparate stabilisation heuristics correspond to specific reservoir choices and coupling coefficients. ERBP thus transforms a collection of folk remedies into a single, quantitative design rule: monitor and budget your entropy flux.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 24491

Loading