Action-Free Offline RL via Demonstrator Diversity

Published: 25 May 2026, Last Modified: 09 Jun 2026DEMO 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Offline reinforcement learning, latent actions, identifiability, action-free data, demonstrator diversity
Abstract: A central bottleneck in transitioning from offline representation learning to online decision-making is the “action gap”: passive datasets (e.g., online videos) often lack the action labels required to ground latent representations in environment dynamics. We propose to bridge this gap by exploiting demonstrator diversity. Even when actions are unobserved, systematic variation in transitions across demonstrators can help disentangle latent action choice from environment stochasticity. We formalize this as a statewise column-stochastic non-negative matrix factorization (NMF) problem, where demonstrator-specific policies act as mixtures over shared latent transition kernels. Under “sufficiently scattered” policy diversity and rank conditions, we prove that latent actions and dynamics are identifiable up to a permutation. We extend these results to continuous observation spaces via a Gram-determinant minimum-volume criterion and prove that spatial continuity ensures a globally consistent action labeling. Our framework shows how heterogeneity in passive data can substitute for missing action labels, leaving limited interaction to resolve only the final action- label alignment.
Submission Number: 92
Loading