Abstract: Diffusion models have recently gained significant
attention in robotics due to their ability to generate multimodal distributions of system states and behaviors. However,
a key challenge remains: ensuring precise control over the
generated outcomes without compromising realism. This is
crucial for applications such as motion planning or trajectory
forecasting, where adherence to physical constraints and taskspecific objectives is essential. We propose a novel framework
that enhances controllability in diffusion models by leveraging
multi-modal prior distributions and enforcing strong modal
coupling. This allows us to initiate the denoising process
directly from distinct prior modes that correspond to different
possible system behaviors, ensuring sampling to align with the
training distribution. We evaluate our approach on motion
prediction using the Waymo dataset and multi-task control
in Maze2D environments. Experimental results show that our
framework outperforms both guidance-based techniques and
conditioned models with unimodal priors, achieving superior
fidelity, diversity, and controllability, even in the absence of
explicit conditioning. Overall, our approach provides a more
reliable and scalable solution for controllable motion generation
in robotics
Loading