Imitation from Observations with Trajectory-Level Generative Embeddings

Published: 02 Mar 2026, Last Modified: 29 Mar 2026ReALM-GEN 2026 - ICLR 2026 WorkshopEveryoneRevisionsCC BY 4.0
Keywords: Offline Imitation Learning, Learning from Observations, Diffusion Models, Representation Learning
Abstract: We consider the offline imitation learning from observations (LfO), where expert demonstrations are scarce and contain only state observations, and the suboptimal policy is far from expert behavior. In this regime, many existing imitation learning approaches struggle to extract useful information from imperfect data since they impose strict support constraints and rely on brittle one-step models. To tackle this challenge, we propose **T**rajectory-level **G**enerative **E**mbedding (TGE) for offline LfO. TGE constructs a dense, smooth surrogate reward by using particle based entropy estimation to maximize the log-likelihood of expert trajectories in the latent space of a temporal diffusion model trained on offline suboptimal data. By leveraging the structured geometry of the learned diffusion embedding, TGE captures long-horizon temporal dynamics and effectively bridges the gap under severe support mismatch, ensuring a robust learning signal even when offline data is distributionally distinct from the expert. Empirically, the proposed approach consistently matches or outperforms prior offline LfO methods across a range of D4RL locomotion and manipulation benchmarks.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 57
Loading