Track: full paper
Keywords: Robot Learning, Long Context, Diffusion, Computational Efficiency
Abstract: Complex robotic tasks often require spatiotemporal reasoning over long sequences of actions and observations. Yet learning long-context policies remains difficult: as context length increases, the training process becomes increasingly compute and memory-intensive, and covariate shifts at deployment become more pronounced. Recent methods typically sidestep these challenges by discarding significant portions of the historical context, risking the loss of crucial information for subsequent decisions. In this paper, we propose a two-stage training approach that explicitly regularizes the information preserved in the learned representation: first, we pre-train a short-context encoder to predict a long sequence of future actions, thereby maximizing the information each frame encodes about long-range dependencies; then, given pre-computed frame embeddings, we fine-tune a long-context decoder on an auxiliary task, where the policy learns to predict past actions alongside future ones. This simple design yields two surprising benefits: substantially reduces memory consumption during training and greatly improves history awareness of the learned policy. Moreover, the auxiliary task provides a natural mechanism for self-verification, allowing the policy to assess its sampled predictions at test time. Experiments on manipulation tasks that necessitate extensive historical context demonstrate that our proposed method improves the performance of long-context policy by 3× and accelerates policy training by more than 10×.
Presenter: ~Yuejiang_Liu1
Format: Yes, the presenting author will definitely attend in person because they are attending ICLR for other complementary reasons.
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 48
Loading