Self-Consuming Generative Models with Adversarially Curated Data

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recent advances in generative models have made it increasingly difficult to distinguish real data from model-generated synthetic data. Using synthetic data for successive training of future model generations creates “self-consuming loops,” which may lead to model collapse or training instability. Furthermore, synthetic data is often subject to human feedback and curated by users based on their preferences. Ferbach et al. (2024) recently showed that when data is curated according to user preferences, the self-consuming retraining loop drives the model to converge toward a distribution that optimizes those preferences. However, in practice, data curation is often noisy or adversarially manipulated. For example, competing platforms may recruit malicious users to adversarially curate data and disrupt rival models. In this paper, we study how generative models evolve under self-consuming retraining loops with noisy and adversarially curated data. We theoretically analyze the impact of such noisy data curation on generative models and identify conditions for the robustness and stability of the retraining process. Building on this analysis, we design attack algorithms for competitive adversarial scenarios, where a platform with a limited budget employs malicious users to misalign a rival’s model from actual user preferences. Experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed algorithms.
Lay Summary: Generative AI models can now create convincing images, text, and videos. They improve over time by retraining on new content, which often scraped from the internet or collected through human feedback. This creates a “self-consuming feedback loop”, where each new model learns from synthetic data generated by earlier versions of itself, often curated based on user preferences. We began this research out of concern that this loop could be disrupted if people intentionally provide misleading feedback. To explore this risk, we imagined a scenario in which two AI companies compete, and one tries to sabotage the other by manipulating its feedback. For example, it might hire people to repeatedly “like” low-quality content from the rival system. Over time, this misleads the model into learning incorrect preferences and drifting away from what real users actually want. We studied how such sabotage could happen and under what conditions models can resist it. We also designed attack strategies and tested them through experiments using both synthetic and real-world data. Our results show that even small, targeted manipulations can gradually misdirect a model. As generative AI becomes more common and self-reinforcing, our work underscores the importance of understanding these risks and developing safeguards against malicious feedback.
Link To Code: https://github.com/osu-srml/Adversarial-Curation
Primary Area: Social Aspects->Alignment
Keywords: generative model, self-consuming loop, human feedback, adversarial curation, attack algorithms
Submission Number: 5643
Loading