MAVEN-T: Breaking the Imitation Ceiling in Trajectory Prediction with Reinforced Distillation

ICLR 2026 Conference Submission16150 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: autonomous driving, trajectory prediction, knowledge distillation, reinforcement learning
Abstract: Knowledge distillation is fundamentally constrained by an "imitation ceiling," where a student model can only replicate a teacher's behavior, including its inherent suboptimalities. This limitation is particularly critical in dynamic, interactive domains where optimal decision-making is paramount. This work introduces a reinforcement-augmented distillation framework that allows a student to transcend its teacher. The student actively interacts with its environment, using feedback to verify, refine, and ultimately correct the teacher's distilled knowledge. This framework is instantiated in a system for the challenging task of multi-agent trajectory prediction. A teacher model with extensive reasoning capacity guides a lightweight, deployment-optimized student via a progressive distillation scheme. Critically, the student's learning is not confined to imitation; it is fine-tuned through reinforcement learning to directly optimize for task-specific objectives such as safety and efficiency. Experiments on real-world driving datasets show the student achieves 6.2x parameter compression and 3.7x inference speedup while maintaining state-of-the-art accuracy. The results further validate that the student can develop policies more robust than the teacher it learned from. This research establishes a new path for deploying complex models, shifting the goal from simple imitation to transcendence. The principle of enabling a student to surpass its teacher holds broad applicability for robotics, game AI, and other interactive learning domains.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 16150
Loading