Can the Future Inform the Present? Investigating Latent Lookahead Refinement via Multi-Token Prediction
Track: long paper (up to 10 pages)
Keywords: latent thinking, multi-token prediction, lookahead
TL;DR: We invesitgate a method to condition next-token generation on future information via a latent lookahead mechanism that builds upon multi-token prediction
Abstract: Most modern language models operate within the autoregressive paradigm, and are trained to perform next-token prediction (NTP) conditioned only on the previous context. However, this results in myopic generation, since these models have no future information to help guide their prediction, which limits their effectiveness in tasks that require lookahead. Recently, multi-token prediction (MTP) has shown promise as an auxiliary training objective to learn more future-aware model parameters, however existing methods typically discard MTP at inference time, or use it only for speculative decoding. In this work, we propose to additionally leverage the future-token prediction capability of MTP at inference for latent lookahead refinement, moving from implicitly future-aware to explicitly future-informed token generation. Our method introduces an internal latent thinking mechanism that operates within a single forward pass and incurs minimal inference overhead. Whilst early results are inconclusive, our experiments suggest that the proposed method could prove to be more effective at scale, and we believe our analyses can serve as foundation for future work.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 95
Loading