Imitation Beyond Expectation Using Pluralistic Stochastic Dominance

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: imitation learning
TL;DR: This paper develops stochastic dominance as an objective for imitation learning to provide stronger guarantees for demonstrators with differing preferences.
Abstract: Imitation learning seeks policies reflecting the values of demonstrated behaviors. Prevalent approaches learn to match or exceed the demonstrator's performance in expectation without knowing the demonstrator’s reward function. Unfortunately, this does not induce pluralistic imitators that learn to support qualitatively distinct demonstrations. We reformulate imitation learning using stochastic dominance over the demonstrations' reward distribution across a range of reward functions as our foundational aim. Our approach matches imitator policy samples (or support) with demonstrations using optimal transport theory to define an imitation learning objective over trajectory pairs. We demonstrate the benefits of pluralistic stochastic dominance (PSD) for imitation in both theory and practice.
Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)
Submission Number: 3969
Loading