In‑Context Planning with Latent Temporal Abstractions

20 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Sequential Decision-Making, Monte Carlo Tree Search, Planning, Model-based Reinforcement Learning, Offline Reinforcement Learning
Abstract: Planning-based reinforcement learning in real-world control faces two coupled obstacles: planning at primitive time scales explodes both context length and branching factor, and the underlying dynamics are often only partially observable. We introduce the In-Context Latent Temporal Abstraction Planner (I-TAP), which unifies in-context adaptation and online planning in a learned latent temporal-abstraction space. From offline trajectories, I-TAP learns an observation-conditioned residual-quantization VAE (RQ-VAE) that discretizes observation–macro-action sequences into a coarse-to-fine stack of residual tokens, together with a residual-quantized temporal Transformer that autoregressively predicts these tokens from recent observation and macro-action histories. This sequence model serves jointly as a context-conditioned prior over abstract actions and a latent-space dynamics model. At inference, I-TAP plans with Monte Carlo Tree Search directly in token space, leveraging short histories to implicitly infer latent factors without any test-time fine-tuning. Across deterministic and stochastic MuJoCo locomotion and high-dimensional Adroit manipulation, including partially observable variants, I-TAP consistently matches or outperforms strong model-free and model-based baselines, demonstrating effective in-context planning under stochastic dynamics and partial observability.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 22749
Loading