Video2Music: Suitable music generation from videos using an Affective Multimodal Transformer model

Jaeyong Kang, Soujanya Poria, Dorien Herremans

Published: 2024, Last Modified: 07 Apr 2025Expert Syst. Appl. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•Pioneering generative music AI model with video emotion matching.•New MuVi-Sync dataset with matched video and music features.•Video2Music framework with Affective Multimodal Transformer.•Post-processing to adjust music dynamics to sync with video.•Outperforms baseline in terms of Music Quality and Video Matching.