Short-to-Long Distillation: Learning Long-Context Policies from Short-Context Supervision

Published: 16 Oct 2025, Last Modified: 10 Nov 2025NeurIPS 2025 ER WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Robot Learning, Sequential Reasoning, Policy Distillation, Data Curation
Abstract: Consistency and reactivity are two essential properties for robotic policies. Yet, recent methods often trade one for the other: using long action chunks and numerous denoising steps to improve intra-chunk consistency, at the cost of lower inference frequency and slower inference speed. In this paper, we first revisit the necessity of these design choices through the lens of data scaling. We find that with sufficient training data, extending history action contexts can substitute for future action chunks, without compromising performance; moreover, conditioning on longer contexts reduces action ambiguity, lessening the need for iterative denoising. Motivated by these observations, we introduce Short-to-Long Distillation, a policy distillation approach that learns a long-context few-step student policy from synthetic data generated by a short-context many-step teacher policy. Central to our approach are two data curation strategies: (i) on-policy noise injection to broaden the coverage of action contexts, and (ii) mode-seeking chunk optimization to sharpen the distribution of action labels. Empirically, our method achieves strong results on diffusion policies across Push-T and RoboMimic tasks. Notably, using only 1k distilled sequences, the student policies match their teachers in static settings and surpass them by up to 40% in stochastic environments. Our results suggest the promise of synthetic data as a scalable alternative to inductive biases for robot learning.
Submission Number: 158
Loading