Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction

Vaishnavh Nagarajan; Chen Henry Wu; Charles Ding; Aditi Raghunathan

Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction

Vaishnavh Nagarajan, Chen Henry Wu, Charles Ding, Aditi Raghunathan

Published: 01 May 2025, Last Modified: 13 Aug 2025ICML 2025 oralEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We design a series of open-ended algorithmic tasks inspired by creative tasks and show that multi-token prediction and seed-conditioning lead to much more creative planning than next-token prediction.

Abstract: We design a suite of minimal algorithmic tasks that are a loose abstraction of _open-ended_ real-world tasks. This allows us to cleanly and controllably quantify the creative limits of the present-day language model. Much like real-world tasks that require a creative, far-sighted leap of thought, our tasks require an implicit, open-ended _stochastic_ planning step that either (a) discovers new connections in an abstract knowledge graph (like in wordplay, drawing analogies, or research) or (b) constructs new patterns (like in designing math problems or new proteins). In these tasks, we empirically and conceptually argue how next-token learning is myopic; multi-token approaches, namely teacherless training and diffusion models, comparatively excel in producing diverse and original output. Secondly, to elicit randomness without hurting coherence, we find that injecting noise at the input layer (dubbed _seed-conditioning_) works surprisingly as well as (and in some conditions, better than) temperature sampling from the output layer. Thus, our work offers a principled, minimal test-bed for analyzing open-ended creative skills, and offers new arguments for going beyond next-token learning and temperature sampling. We make part of the code available under https://github.com/chenwu98/algorithmic-creativity

Lay Summary: In the near future, we hope to use AI for doing science and for generating fresh data (like fresh math problems) to train other AI models. Unlike today's benchmark tasks, these are "open-ended" tasks where you want the model to explore new and diverse ways responding --- like a scientist would. So, how well are AI models suited for these open-ended tasks? This is very hard to answer because judging metrics like diversity and novelty is subjective! But here's how we do it: we design _simple_ open-ended tasks, where we can quantify how different factors of the tasks/the models affect these metrics -- very much like how a Galileo or a Newton would choose simple objects like spheres and pendulums to study the effect of mass, gravity, initial velocity, angle of throw, and so on. Our tasks are inspired by day-to-day creative tasks that require a random "eureka" moment. In these tasks, we study two types of creativity, one that requires combining pieces of knowledge (like in wordplay) and another that requires designing clever constructions (like puzzle-design or story-design). In these tasks, we play around with language models and find two different things: 1. Currently, we teach models by making them predict one word after the other. Instead, we want to teach the model to predict many words in advance so it actually learns the big picture of such creative tasks! This is what we mean by **Look before you leap**. 2. Currently, under the hood, AI models first figure out many possible responses, and only then randomly select one response to provide. This is too much wasted work! Instead, we argue the model should fix its random decision(s) first, and then work out just one single response and produce it. This is what we mean by **Roll the dice before you leap**. Overall, we design simple settings that quantify the limits of language models in two distinct types of creativity. This allows future work to have greater clarity on how to evaluate open-ended thinking, and explore our algorithmic ideas and findings in real-world settings.

Link To Code: https://github.com/ChenWu98/algorithmic-creativity

Primary Area: Deep Learning->Large Language Models

Keywords: next-token prediction, multi-token prediction, creativity

Submission Number: 12175

Loading