Keywords: AI, Music, Symbolic music, Machine learning, JEPA, Self-supervised learning, Representation learning
TL;DR: We applied JEPA to symbolic music and got competitive results on some of the downstream tasks.
Abstract: Despite the growing success of Joint Embedding Predictive Architectures (JEPA) in vision and speech, their potential for symbolic music representation learning remains unexplored. In this work, we adapt JEPA to the symbolic music domain by introducing music-specific masking strategies and combining different regularization techniques. The model achieves competitive results with much less training compute on a few downstream tasks, while falling short on a few others. This is mainly attributed to a bias towards positional information in the learned representation.
Track: Paper Track
Confirmation: Paper Track: I confirm that I have followed the formatting guideline and anonymized my submission.
Submission Number: 70
Loading