Keywords: Dance to Music Generation, Part-Wise, Slow-Fast Motion, Diffusion
Abstract: Dance-to-Music generation aims to compose music that is rhythmically aligned with human dance movements. While recent diffusion-based approaches have achieved promising results, they treat the dancer's body as a holistic unit when extracting motion features, thereby overlooking the fine-grained rhythmic contributions of individual body parts and the heterogeneous temporal dynamics manifested in both slow and fast motion patterns. In this work, we approach the dance-to-music generation task from a fresh conditioning encoding viewpoint, where part-wise motion energy decomposition and a hierarchical slow-fast conditioning encoder are integrated to generate the conditioning for music latent diffusion. Through comprehensive subjective and objective evaluations of rhythm synchronization and generated music quality, experimental results on the AIST++ and TikTok benchmarks confirm that our framework consistently outperforms existing state-of-the-art approaches for dance-to-music generation.
Primary Area: generative models
Submission Number: 20003
Loading