Using a Joint-Embedding Predictive Architecture for Symbolic Music Understanding

Rafik Hachana; Bader Rasheed

Using a Joint-Embedding Predictive Architecture for Symbolic Music Understanding

Rafik Hachana, Bader Rasheed

Published: 23 Sept 2025, Last Modified: 08 Nov 2025AI4MusicEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI, Music, Symbolic music, Machine learning, JEPA, Self-supervised learning, Representation learning

TL;DR: We applied JEPA to symbolic music and got competitive results on some of the downstream tasks.

Abstract: Despite the growing success of Joint Embedding Predictive Architectures (JEPA) in vision and speech, their potential for symbolic music representation learning remains unexplored. In this work, we adapt JEPA to the symbolic music domain by introducing music-specific masking strategies and combining different regularization techniques. The model achieves competitive results with much less training compute on a few downstream tasks, while falling short on a few others. This is mainly attributed to a bias towards positional information in the learned representation.

Track: Paper Track

Confirmation: Paper Track: I confirm that I have followed the formatting guideline and anonymized my submission.

Submission Number: 70

Loading