Temporal Difference Flows

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 oralEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Predictive models of the future are fundamental for an agent's ability to reason and plan. A common strategy learns a world model and unrolls it step-by-step at inference, where small errors can rapidly compound. Geometric Horizon Models (GHMs) offer a compelling alternative by directly making predictions of future states, avoiding cumulative inference errors. While GHMs can be conveniently learned by a generative analog to temporal difference (TD) learning, existing methods are negatively affected by bootstrapping predictions at train time and struggle to generate high-quality predictions at long horizons. This paper introduces Temporal Difference Flows (TD-Flow), which leverages the structure of a novel Bellman equation on probability paths alongside flow-matching techniques to learn accurate GHMs at over 5x the horizon length of prior methods. Theoretically, we establish a new convergence result and primarily attribute TD-Flow's efficacy to reduced gradient variance during training. We further show that similar arguments can be extended to diffusion-based methods. Empirically, we validate TD-Flow across a diverse set of domains on both generative metrics and downstream tasks, including policy evaluation. Moreover, integrating TD-Flow with recent behavior foundation models for planning over policies demonstrates substantial performance gains, underscoring its promise for long-horizon decision-making.
Lay Summary: Predicting future events accurately is crucial for intelligent systems that plan and make decisions over long periods. Typically, systems predict the future by repeatedly predicting one step at a time, but even tiny errors at each step can snowball into big mistakes, making long-term predictions very unreliable. To address this issue, we developed Temporal Difference Flows (TD-Flow), a method that directly predicts future states over extended horizons, avoiding the buildup of small errors. TD-Flow combines a new theoretical insight with advanced techniques from generative modeling, which allows it to predict accurately over much longer time horizons—up to five times longer than existing methods. We demonstrated, both theoretically and experimentally, that TD-Flow enables stable and accurate long-term predictions. When tested in different scenarios, TD-Flow consistently outperformed previous methods, improving not only prediction accuracy but also how effectively agents can assess and decide on what behaviours to commit to for extended periods of time. Ultimately, TD-Flow holds great promise for enhancing long-term decision-making capabilities across various complex systems.
Primary Area: Reinforcement Learning
Keywords: Reinforcement Learning, Geometric Horizon Model, Gamma-Model, Temporal Difference Learning, Successor Measure, Flow Matching
Submission Number: 12288
Loading