Keywords: Robot Imitation Learning, Diffusion Models, Mixture of Experts
TL;DR: CoRDE distills monolithic diffusion policies into a parameter-efficient Mixture of Experts using semantic priors and Low-Rank Adaptation. It improves multi-task success and inference speed while preserving behavioral diversity.
Abstract: Diffusion models excel at capturing multi-modal action distributions in robot imitation learning. However, in multi-task and long-horizon scenarios, monolithic architectures lack structural generalization capabilities, suffering from gradient conflicts between distinct semantic sub-stages. While pure data-driven Mixture-of-Experts (MoE) methods introduce labor division, they frequently trigger routing collapse, and instantiating full-scale experts causes parameter explosion and high expansion costs. To address these issues, we propose Concept-prior Routed Diffusion Experts (CoRDE), a structure-guided variational distillation framework. CoRDE extracts semantic distributions from a frozen concept encoder to guide the variational posterior responsibility via a learnable soft mapping matrix. This mechanism enforces a dual-entropy dynamics conservation: it minimizes routing entropy to guarantee macroscopic cognitive certainty, while preserving the full-rank variance of the stochastic diffusion term to maintain behavioral diversity. To overcome parameter inflation, CoRDE employs a parameter-efficient expert pool using Low-Rank Adaptation (LoRA) on a shared frozen backbone. Rigorous mathematical proofs demonstrate that the mixture score field of low-rank experts strictly approximates the teacher distribution in the least-squares sense, avoiding rank-deficiency-induced diversity loss and ensuring high-fidelity generation. Empirical evaluations confirm that, compared to existing baselines, CoRDE systematically reduces routing collapse, forming robust, semantically aligned expert allocations while achieving superior action quality and incremental learning efficiency.
Submission Number: 11
Loading