RNInfer: A Large Language Model Approach to Functional Harmonic Reasoning in Symbolic Music

Published: 08 Sept 2025, Last Modified: 19 Sept 2025LLM4Music @ ISMIR 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Roman Numeral Analysis, Symbolic Music Analysis, Reasoning Model
Abstract: Existing machine learning models for Roman Numeral Analysis (RNA) treat the task as a classification problem, providing labels without the explanatory reasoning that is central to music theory. This "black box" approach misaligns with the goal of harmonic analysis, which is to deepen musical understanding. To address this, we introduce RNInfer, a novel framework that bridges a pre-trained symbolic music encoder with a Large Language Model (LLM) to perform interpretable RNA. Our architecture uses a lightweight projector to align musical features with the LLM's embedding space, enabling it to reason about harmonic content. We propose Octuple+, an enhanced tokenization scheme that incorporates crucial enharmonic spelling information into the music encoder. The model is trained in two stages: supervised fine-tuning to learn the analysis task, followed by reinforcement learning with Group Relative Policy Optimization (GRPO) to generate human-readable reasoning traces without requiring annotated examples. Our experiments show that RNInfer achieves competitive accuracy on the primary analysis task, and we demonstrate its capability to generate structured explanations for its predictions, marking a critical step toward more transparent and pedagogically useful models for computational musicology.
Submission Number: 25
Loading