Latent Imagination Thinking: Beyond Recursive Models for Reasoning

Published: 02 Mar 2026, Last Modified: 05 Mar 2026ICLR 2026 Workshop World ModelsEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Latent reasoning, Test-time compute, Recursive inference, Teacher–student learning, Diffusion models, Large language models, Visual reasoning, World Model
TL;DR: We propose Latent Imagination Thinking (LIT), a teacher–student framework that trains models to perform reasoning as latent belief updates rather than recursive prediction in observation space.
Abstract: Reasoning capabilities of generative AI have recently improved by making models think through recursion: LLMs re-consume their own tokens (Chain-of-Thought), and diffusion models iteratively refine pixels or reconstruction-trained latents. While practical, this common design reduces reasoning to an *observational* space and conflates two roles: *latent reasoning* (discovering a task-appropriate internal language and maintaining a belief over solutions) and modeling the observational space. We introduce **Latent Imagination Thinking (LIT)**, a teacher--student learning paradigm that treats tokens and pixels as partial observations rather than the language of thought. To provide learning guidance for latent states, a posterior model (teacher) refines its belief using additional task-relevant observations, and a prior model (student) is trained to imagine these refinements via an imagination loss with stop-gradient targets. This turns recurrence into a latent belief update rather than repeated prediction in the observation space. We evaluate LIT on hard Sudoku puzzles in language and visual (MNIST) spaces. Increasing the number of thinking steps improves reasoning under a fixed compute budget more reliably than state-of-the-art recursive baselines acting in observation space. LIT closes the vision--language gap: our visual model reaches performance on-par with the second-best state-of-the-art language model, solves the visual baseline ($\\sim 100\\%$) while producing diverse solutions, and improves over the visual state-of-the-art (51%). Finally, adding our imagination inductive bias to the best language model improves accuracy by 14.8%.
Supplementary Material: zip
Submission Number: 85
Loading