Video-to-Music Generation for Film Production: A Dataset and Framework

Published: 23 Sept 2025, Last Modified: 08 Nov 2025AI4MusicEveryoneRevisionsBibTeXCC BY 4.0
Keywords: music generation
Abstract: Despite growing interest in video-to-music generation systems, their application in film production remains limited, primarily due to the lack of large-scale datasets containing aligned pairs of movie clips and soundtracks. Although prior work has attempted to construct such a dataset, this comprises only 36.5 hours of data, which is insufficient for training robust models. In this paper, we present CineScore Dataset, a novel dataset comprising pairs of video clips from films and their corresponding soundtracks, curated with a novel methodology that automatically identifies and extracts soundtrack segments from video clips, consisting of 552.70 hours and 76,408 video clips sourced from both public domain movies as well as commercial ones from a publicly available dataset. Our comprehensive objective evaluation results show the usefulness of our dataset for building a soundtrack generation model for film production.
Track: Paper Track
Confirmation: Paper Track: I confirm that I have followed the formatting guideline and anonymized my submission.
Submission Number: 24
Loading