Robotics in Representation Space: Learned Latents Meet Composable Costs

Robotics in Representation Space: Learned Latents Meet Composable Costs

ICLR 2026 Conference Submission15103 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: robotics, motion prediction, autonomy, tokenization, generative models

TL;DR: We learn highly compressed representations of trajectories from driving scenarios and use a flexible latent search approach to generate new behaviors according to arbitary objectives.

Abstract: Deep learning methods have vastly expanded the capabilities of motion planning in robotics applications, as learning priors from large-scale data has shown to be essential in capturing the highly complex behavior required for solving tasks such as manipulation or navigation for autonomous vehicles. At the same time, model-based planning algorithms based on search or optimization remain an essential tool due to their flexibility, efficiency and the ability to incorporate domain knowledge via expert designed algorithms and objective functions. We propose a simple framework to unify these two paradigms. First, we learn an autoencoder with a high compression ratio and a latent space of causally ordered, discrete-valued tokens. Leveraging both the dimensionality reduction and the causal structure learned by this autoencoder, we then perform motion planning by directly searching in the latent space of tokens. Notably, this search can optimize arbitrary user-specified objective functions without requiring the training of any additional neural networks, providing a large degree of flexibility at test time while maintaining efficiency and producing feasible and realistic solutions by relying on the generative capabilities of the highly compressed autoencoder. We evaluate our method on the Waymo Open Motion Dataset, showing how a simple latent space search can be used for motion prediction. Beyond prediction, we demonstrate the inclusion of simple objectives for guided behavior generation. Finally, we investigate the application of our method for multi-agent interaction modeling, enabling flexible scenario design and understanding.

Primary Area: applications to robotics, autonomy, planning

Submission Number: 15103

Loading