Keywords: Generative Models, Alignment, Preference Modeling, Recursive Training
Abstract: As generative models are increasingly trained on their own outputs, data curation becomes the key force shaping what values persist. We formalize this recursive loop as a two-stage game between two agents: the model Owner and the Public. Each round, the generative model outputs are filtered by both agents and returned to the training pool, progressively amplifying curator preferences. We analyze the dynamics under varying degrees of misalignment between the Owner and the Public—ranging from perfect alignment to partial and fully disjoint preferences—and show that the system converges exponentially to distinct long-run behaviours. Finally, we establish an alignment trilemma: No Bradley--Terry alignment process can simultaneously satisfy stability, diversity, and value alignment with both the Owner and the Public.
Submission Number: 12
Loading