Track: regular paper (up to 6 pages)
Keywords: next-token prediction, multi-token prediction, diffusion, creativity, planning, short cuts
TL;DR: We construct a suite of controlled open-ended tasks, inspired by creative tasks, where next-token prediction myopically learns shortcuts and suffers while multi-token approaches generates diverse and original outputs.
Abstract: In _open-ended_ tasks --- such as designing word problems or discovering novel proofs --- the goal is not only correctness but also diversity and originality. Often, this requires a far-sighted, creative leap of thought. We argue that this requirement is misaligned with the objective of next-token prediction (NTP). To formulate our intuition, we design a suite of minimal algorithmic tasks loosely based on real-world creative endeavors. Concretely, our tasks require an open-ended _stochastic_ planning step that (a) discovers new connections in a knowledge graph (loosely inspired by word-play, humor or drawing analogies) or (b) constructs new patterns (loosely inspired by constructing word problems, puzzles or mysteries).
We then conceptually and empirically argue how NTP leads to myopic shortcut-learning and excessive memorization, limiting its ability to generate novel solutions. In contrast, we find that multi-token approaches, namely teacherless training and diffusion models, can overcome these limitations and comparatively excel on our algorithmic test-bed. Orthogonally, we find that creativity in our tasks is greatly improved by training with a random hash prefix (which we dub as ``_{hash-conditioning_'').
Thus our work offers a principled, minimal test-bed for studying open-ended forms of intelligence and also a new angle to take a more serious interest in the paradigm of multi-token prediction.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Format: Yes, the presenting author will definitely attend in person because they attending ICLR for other complementary reasons.
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Presenter: ~Vaishnavh_Nagarajan3
Submission Number: 16
Loading