Unsupervised Mode Discovery for Fine-tuning Multimodal Action Distributions

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Mode Discovery, Fine-tuning Diffusion Policies, Robotic Manipulation, Reinforcement Learning
TL;DR: We introduce an unsupervised mode discovery framework that regularizes reinforcement learning fine-tuning of generative policies, improving task success in robotic manipulation while preserving multimodal behaviors.
Abstract: We address the problem of fine-tuning pre-trained generative policies with reinforcement learning while preserving the multimodality of the action distributions of such policies. Current methods for fine-tuning generative policies (e.g. diffusion policies) with reinforcement learning improve task performance but tend to collapse diverse behaviors into a single reward-maximizing mode. To overcome this, we propose MD-MAD, an unsupervised mode discovery framework that uncovers latent behaviors in generative policies, together with a mutual information metric to quantify multimodality. The discovered modes allow mutual information to be used as an intrinsic reward, regularizing reinforcement learning fine-tuning to improve success rates while maintaining diverse strategies. Experiments on robotic manipulation tasks demonstrate that our method consistently outperforms conventional fine-tuning, achieving high task success while preserving richer multimodal action distributions.
Primary Area: reinforcement learning
Submission Number: 19312
Loading