Encoder-Only Transformers for Melodic Harmonization: Representation Emergence and Inference Strategies
Keywords: Melodic harmonization; Non-autoregressive transformer; Encoder-only architecture; Attention dynamics
TL;DR: Symbolic melodic harmonization with encoder only models, comparing different ablations and discussing self- and cross-attention dynamics that emerge.
Abstract: This paper addresses the problem of melodic harmonization --the automatic generation of harmonic accompaniments that complement a given melody-- using non-autoregressive, encoder-only transformer models operating on a synchronized melody–harmony time grid. The proposed framework allows flexible conditioning, such as fixing chords at specific positions, while maintaining high generative quality. Comparative experiments show that single-encoder models outperform dual-encoder architectures despite using fewer parameters. Interestingly, harmony-related attention patterns emerge even when harmony tokens remain fully masked during training, and models using only cross-attention achieve comparable results, suggesting implicit modeling of harmony–harmony relations. Different inference unmasking strategies further reveal notable effects on harmonic structure and coherence.
Submission Number: 7
Loading