PRISM: Performer RS-IMLE for Single-pass Multisensory Imitation Learning

ICLR 2026 Conference Submission22711 Authors

20 Sept 2025 (modified: 25 Nov 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Imitation Learning, Diffusion Policy, Flow-matching Policy, Implicit Maximum Likelihood Estimation, Manipulation, Quadruped, Locomotion
Abstract: Robotic imitation learning typically requires models that capture multimodal action distributions while operating in real-time control rates and accommodating multiple sensing modalities. Although recent generative approaches such as diffusion models, flow matching, and Implicit Maximum Likelihood Estimation (IMLE) have achieved promising results in this domain, they satisfy only a subset of these requirements. To satisfy these requirements, we introduce PRISM, based on a batch-global rejection-sampling variant of IMLE. PRISM is a single-pass policy that couples a temporal multisensory encoder (e.g, RGB, Depth, tactile, audio, proprioception) with a linear-attention generator using a Performer architecture. We validate on MetaWorld, CALVIN, Robomimic, and a real hardware suite using a Unitree Go2 with a 7-DoF arm, wrist and shoulder RGB, tactile, audio, and proprioception sensors. PRISM matches or outperforms diffusion, flow-matching, and prior IMLE policies in terms of task success rates, robustness, and sample efficiency. In CALVIN with 10\% of the data, PRISM improves the success rate by $\sim$ 10\% over IMLE, $\sim$ 20\% over flow matching, and $\sim$ 25\% over diffusion, while reducing the jerk by about $20\times$. On MetaWorld, PRISM is 5-12\% on Hard/Very-Hard splits over diffusion and flow baselines. Real-world loco-manipulation shows 10--25\% higher success and maintains faster inference diffusion policy. These results position PRISM as a fast, accurate, and multisensory imitation policy that retains multimodal action coverage without iterative sampling.
Supplementary Material: pdf
Primary Area: generative models
Submission Number: 22711
Loading