Large Trajectory Models are Scalable Motion Predictors and Planners

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: applications to robotics, autonomy, planning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Motion Prediction, Motion Planning, Autonomous Driving
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: The unified Transformer architecture surpasses previous state-of-the-art methods in both motion prediction and motion planning, demonstrating adherence to scaling laws and exhibiting substantial generalization capabilities on unseen maps.
Abstract: Motion prediction and planning are vital challenges in autonomous driving. Recent works serialize observations, states (or actions), and rewards in a sequence with compact and versatile Transformers to approach motion planning as a sequence modeling problem. However, the efficacy of these methods under stochastic and interactive environments without simulators remains to be investigated. Learning from large real-world datasets for autonomous driving additionally challenges the models to interpret heterogeneous behaviors and policies from ambiguous demonstrations, understand diverse road topologies, reason over a longer horizon of up to 8 seconds, and generate outputs in a larger continuous state space. In this research, we reformulate the motion prediction and motion planning problem by arranging all elements into a sequence modeling task and propose the State Transformer (STR). With comparative test settings, STR consistently outperforms the benchmarks in both motion planning and motion prediction tasks. Remarkably, our experiment results reveal that large trajectory models (LTMs), such as STR, adhere to the scaling laws by presenting outstanding adaptability and learning efficiency when trained using larger Transformer backbones. Qualitative analysis illustrates that LTMs are capable of generating plausible predictions in scenarios that diverge significantly from the training dataset's distribution. LTMs can also learn to make complex reasonings for long-term planning, extending beyond the horizon of 8 seconds, without explicit loss designs or costly high-level annotations.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5908
Loading