MCLEMCD: multimodal collaborative learning encoder for enhanced music classification from dances

Wenjuan Gong, Qingshuang Yu, Haoran Sun, Wendong Huang, Peng Cheng, Jordi Gonzàlez

Published: 01 Jan 2024, Last Modified: 25 Sept 2025Multim. Syst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Music classification is widely applied in the automatic organization of music archives and intelligent music interfaces. Music is frequently accompanied by other media, such as image sequences. Combining various types of media for various tasks is natural for humans but extremely difficult for machines. In this work, we propose a collaborative learning method to combine dancing motions and music cues for music classification and apply it to music recommendations from dancing motions. Dancing motions in the form of 3D joint positions contain cyclic motions synchronized with music beats, and a collaborative autoencoder is designed to fuse music cues into a dancing motion feature extraction module. The proposed method achieved \(98.07\%\) on the MusicToDance data set and \(65.29\%\) on the AIST++ data set. The code to run all experiments is available at https://github.com/wenjgong/musicmotion.

External IDs:dblp:journals/mms/GongYSHCG24