Real-Time Text-Conditioned World Models for Interactive Prototyping

ICLR 2026 Conference Submission17549 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: World models, autoregressive models, real time
Abstract: State-of-the-art world models have been used to produce sequences of gameplay that accord with provided user-input actions, with the suggestion that such models could have creative applications such as quick prototyping of game ideas. However, high quality, consistent gameplay generation often comes at the cost of inference speed, making real-time interactive play challenging. Models are also limited in their ability to generate new content that deviates from original gameplay, particularly when trained on data from a single environment. In this work we demonstrate two major steps towards enabling interactive, real-time ideation. Building on an autoregressive world model capable of generating highly consistent and complex sequences over minutes (Kanervisto et al., 2025), we enable substantial model speed-up with minimal deterioration in output quality. This is done by replacing the next-token prediction paradigm with discrete diffusion, introducing a lightweight refinement transformer which carries out iterative masked predictions. Subsequently, we explore how new game behaviours can be learned and triggered at inference time in a controlled manner. To this end, we introduce text to control the game environment generated by the model, and curate the BodySwap dataset which simulates a character swapping mechanism allowing to change the playable character using a text prompt. Our results highlight the potential of world models as real-time prototyping tools, enabled by intentional curation of small datasets and efficient fine-tuning.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 17549
Loading