MYMV: A Music Video Generation System with User-preferred Interaction

Kyungjune Lee, Mingyu Jang, Jungwoo Huh, Jeonghaeng Lee, Seokkeun Choi, Sanghoon Lee

Published: 2024, Last Modified: 17 Apr 2025APSIPA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The advancements in computer vision and graphics have increased the demand for creating multi-modal content with dynamic 3D faces, such as in singing or talking face generation. However, due to the requirement for specialized knowledge in each modality, creating or reproducing such content is highly demanding in terms of manual effort. To fill this void, we present Make Your Music Video (MYMV), which enables users to easily produce multi-modal content. Our proposed system is featured for 1) song generation, 2) facial motion generation, and 3) virtual backgrounds generation. After all modalities are generated, they are combined into a 3D facial music video. Through our user-preferred interface, users can directly edit songs and backgrounds, facilitating easy participation in the music video production process. To evaluate our system, we construct a music video dataset from results of MYMV. The evaluation results show that our system achieves metrics indicating the generation of sufficiently natural 3D facial music videos. The demonstration video is available at https://github.com/jmg1002/MYMV.