The Alignment Game: The Inevitable Conflict of Values in Generative Models

Ali Falahati; Mohammad Mohammadi Amiri; Kate Larson; Lukasz Golab

The Alignment Game: The Inevitable Conflict of Values in Generative Models

Ali Falahati, Mohammad Mohammadi Amiri, Kate Larson, Lukasz Golab

Published: 25 Jul 2025, Last Modified: 12 Oct 2025COLM 2025 Workshop SoLaR PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Generative Models, Alignment, Preference Modeling, Recursive Training

Abstract: As generative models are increasingly trained on their own outputs, data curation becomes the key force shaping what values persist. We formalize this recursive loop as a two-stage game between two agents: the model Owner and the Public. Each round, the generative model outputs are filtered by both agents and returned to the training pool, progressively amplifying curator preferences. We analyze the dynamics under varying degrees of misalignment between the Owner and the Public—ranging from perfect alignment to partial and fully disjoint preferences—and show that the system converges exponentially to distinct long-run behaviours. Finally, we establish an alignment trilemma: No Bradley--Terry alignment process can simultaneously satisfy stability, diversity, and value alignment with both the Owner and the Public.

Submission Number: 12

Loading