Can the Future Inform the Present? Investigating Latent Lookahead Refinement via Multi-Token Prediction
Track: long paper (up to 10 pages)
Keywords: latent thinking, multi-token prediction, lookahead
TL;DR: Enabling future informed next-token prediction via latent lookahead refinement using multi-token prediction
Abstract: Most modern language models operate within the autoregressive paradigm, and are trained to perform next-token prediction (NTP) conditioned only on the previous context. However, this results in myopic generation, since these models have no future information to help guide their prediction, which limits their effectiveness in tasks that require lookahead. Recently, multi-token prediction (MTP) has shown promise as an auxiliary training objective to learn more future-aware model parameters, however existing methods typically discard MTP at inference time, or use it only for speculative decoding. In this work, we propose to additionally leverage the future-token prediction capability of MTP at inference for latent lookahead refinement, moving from implicitly future-aware to explicitly future-informed token generation. Our method introduces an internal latent thinking mechanism that operates within a single forward pass and incurs virtually no inference overhead. Whilst early results are inconclusive, our experiments suggest that the proposed method could prove to be more effective at scale, and we believe our analyses can serve as foundation for future work.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Somesh_Mehra1
Format: Maybe: the presenting author will attend in person, contingent on other factors that still need to be determined (e.g., visa, funding).
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 95
Loading