Music to Video Matching Based on Beats and Tempo

Published: 23 Sept 2025, Last Modified: 08 Nov 2025AI4MusicEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Unsupervised learning, Deep Learning, Music Matching, Digital Signal Processing
TL;DR: We developed a simple, unsupervised rhythm-based metric that effectively matches music to videos by aligning musical beats with on-screen movement, outperforming more complex methods.
Abstract: The growing popularity of video-sharing platforms has intensified the need to harmonize video content with appropriate music. For creators, this pairing often demands considerable manual effort and repeated experimentation. This paper presents an automated solution that streamlines music selection by analyzing the raw content of both video and audio. Our approach evaluates a video's visual dynamics by measuring movement distribution over time via optical flow. Concurrently, for candidate music tracks, the system extracts the onset envelope to analyze rhythmic tempo evolution. We introduce novel similarity metrics, informed by these signals, to assess the suitability of a music-video pair. This method requires only raw sources, eschewing the need for supplemental metadata and distinguishing it from complex deep multi-modal fusion strategies. Experimental results, utilizing both continuous cross-correlation and discrete beat-distance metrics, demonstrate our model's efficacy across diverse video and music genres. Evaluations show a heightened accuracy in identifying the original video-music alignment compared to alternative pairings.
Track: Paper Track
Confirmation: Paper Track: I confirm that I have followed the formatting guideline and anonymized my submission.
Submission Number: 43
Loading