Keywords: music emotion recognition, hyperbolic embeddings, coarse-to-fine classification, structure-aware modeling, representation learning
TL;DR: We introduce a structure-aware hyperbolic model for coarse-to-fine emotion classification in lyrics, achieving interpretable and superior performance via contrastive structure alignment.
Abstract: Song lyrics possess natural hierarchical structure that remains unexploited in computational models, creating a significant gap in emotion recognition systems. We introduce StructFormer, a novel framework that leverages the symbolic structure of lyrics and the hierarchical nature of emotions by encoding them in hyperbolic space, which provides exponentially more efficient capacity compared to traditional Euclidean approaches. Our approach incorporates paragraph-line structure as an inductive bias and employs a multi-level supervision strategy with both fine-grained and coarse-grained labels. The key innovations include three theoretically-grounded components: (1) a structure-aware embedding module that fuses semantic and structural information through computationally efficient gated alignment, (2) a hyperbolic projection that captures hierarchical relationships among emotion labels with mathematical guarantees, and (3) geometric consistency losses that enforce coherence between structural segmentation and emotional representation. Experimental results on a meticulously curated dataset of lyric lines demonstrate that StructFormer achieves substantially improved embedding coherence while maintaining competitive classification performance across diverse emotion categories compared to state-of-the-art baselines. Beyond lyrics analysis, our unified approach establishes a new paradigm for hierarchical representation learning applicable to any domain with nested structural properties, addressing fundamental limitations in how emotional hierarchies are modeled computationally.
Archival Status: Non-archival (not included in proceedings)
Submission Number: 56
Loading