Markovian Generation Chains in Large Language Models

ICLR 2026 Conference Submission21102 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: social impact, machine translation, model collapse, LLM-generated content, language convergence
TL;DR: While much focus has been placed on model collapse, our paper concentrates on another perspective: what happens when content generated by LLMs is re-processed by LLMs?
Abstract: Large language models (LLMs) have already been widely used in our lives, so what happens when people repeatedly process text using these models? In this paper, we investigate the Markovian generation chain in LLMs: a fixed prompt is combined with the most recent output to produce the next output, and this procedure is repeated over multiple iterations. In our simulated iterative generation tasks (e.g., rephrasing and translation), the model's outputs may either converge to a set of similar results or continue to produce distinct outputs for a finite number of steps. While the outcome depends on the model, its configuration, and the input text, it is completely unlike the model collapse observed when models are iteratively trained on generated data. This process can be modeled and analyzed using a Markov chain, and it can be mapped to real-world scenarios. Our study involved not only various LLMs but also Google Translate as a reference. At the sentence level, LLMs have the potential to increase the text diversity, for example, when the original text shows limited variation. Researchers need to think more about where this chain will lead.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 21102
Loading