Iterative Computation as Anytime Forecasting: Dense Supervision for Calibrated Trajectories in Recurrent World Models
Keywords: Iterative Computation, World Models, Reasoning
Abstract: Many modern neural forecasters \emph{iterate}: world models roll out a learned transition, recursive Transformers refine a prediction over many cycles, looped language models ``think longer'' before answering. We collect these systems under a single abstraction -- \emph{Iterative Neural Computations} (INCs) -- and identify the standard practice of supervising only the final iterate (\emph{endpoint supervision}) as the shared cause of two failure modes that matter directly for forecasting: (i) gradients through long rollout chains are noisy and direction-corrupted, destabilizing long-horizon training; and (ii) intermediate iterates are unconstrained, ruling out anytime prediction and horizon extrapolation.
We introduce \emph{Dense Intermediate Consistency for Endpoints} (DICE), a model-agnostic training framework that supervises every iterate through a \emph{shared} readout head. The change is purely in the loss, adds $<5\%$ training compute, and turns any INC into an anytime forecaster whose intermediate states are valid, calibrated predictions. We further derive a probability-space stability bound linking DICE to a decision-theoretic stopping rule, Adaptive Stability Halting. Across three INC families, DICE delivers near-perfect horizon extrapolation on prefix sums, $+7.4$\,pp on long-horizon maze planning, $+3.97$\,pp on bAbI temporal reasoning, and a $6.6\times$ inference-time speedup with no accuracy loss.
Submission Number: 172
Loading