Unsupervised Mode Discovery for Fine-tuning Multimodal Action Distributions

ICLR 2026 Conference Submission19312 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Mode Discovery, Fine-tuning Diffusion Policies, Robotic Manipulation, Reinforcement Learning
TL;DR: We introduce an unsupervised mode discovery framework that regularizes reinforcement learning fine-tuning of generative policies, improving task success in robotic manipulation while preserving multimodal behaviors.
Abstract: We address the problem of fine-tuning pre-trained generative policies with reinforcement learning while preserving their multimodality in the action distribution. Current methods for fine-tuning generative policies (e.g. diffusion policies) with reinforcement learning improve task performance but tend to collapse diverse behaviors into a single reward-maximizing mode. To overcome this, we propose MD-MAD, an unsupervised mode discovery framework that uncovers latent behaviors in generative policies, together with a conditional mutual information metric to quantify multimodality. The discovered modes allow mutual information to be used as an intrinsic reward, regularizing reinforcement learning fine-tuning to improve success rates while maintaining diverse strategies. Experiments on robotic manipulation tasks demonstrate that our method consistently outperforms conventional fine-tuning, achieving high task success while preserving richer multimodal action distributions.
Primary Area: reinforcement learning
Submission Number: 19312
Loading