Efficient and Scalable Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning

Published: 29 Oct 2024, Last Modified: 03 Nov 2024CoRL 2024 Workshop MRM-D PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Policy, Mixture-of-Experts, Imitation Learning
TL;DR: A scalable and more efficient Diffusion Transformer Architecture, that uses Mixture of Experts for efficient denoising.
Abstract: Diffusion Policies have become widely used in Goal-Conditioned Imitation Learning, offering several appealing properties, such as generating multimodal and discontinuous behavior. As models are becoming larger to capture more complex capabilities, their computational demands increase, as shown by recent scaling laws. Therefore, continuing with the current architectures will present a computational roadblock. To address this, we propose Mixture-of-Denoising Experts (MoDE) as a novel policy for guided behavior generation. MoDE is able to achieve competitive performance to current state-of-the-art dense transformer-based Diffusion Policies while requiring fewer active parameters, reducing the inference cost significantly. To achieve this, MoDE introduces a novel routing strategy that conditions the expert selection on the current noise level of the diffusion denoising process. MoDE achieves competitive or state of the art performance on four established imitation learning benchmarks, including CALVIN and LIBERO. In addition, we perform thorough ablations on the various components in MoDE.
Submission Number: 49
Loading