PQG-A2SA: Performance Quantification Guided Audio-to-Score Alignment for Orchestral Music

Zhicheng Lian, Haonan Cheng, Jiawan Zhang

Published: 01 Jan 2023, Last Modified: 11 Apr 2025IEEE ACM Trans. Audio Speech Lang. Process. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Audio-to-score alignment is a multi-modal task that aims at generating an accurate mapping between symbolic and signal-level representations of musical signals, which is important for music performance analysis and retrieval. Among numerous music genres, orchestral music is a category of music with complex performance characteristics such as multi-instrument, non-percussive instrument and music expressiveness. However, previous methods do not take sufficient account of the performance characteristics of orchestral music, leading to limitations in alignment accuracy on orchestral music of these methods. To solve this problem, we present a performance quantification guided audio-to-score alignment (PQG-A2SA) method with high alignment accuracy for orchestral music at note-level. Specially, the PQG-A2SA contains two parts, namely an Inter Onset Interval (IOI) guided conditionally-constrained Dynamic Time Wrapping (DTW) and an articulation guided onset and offset detection. Different from the previous work, the IOI-guided conditionally-constrained DTW is designed to achieve a preliminary mapping between symbolic and chord-level representations of musical signals. In the second module, the onset and offset detection model under different musical articulations are established, thus refining the alignment results. We provide extensive experimental validation and analysis of our method. Our PQG-A2SA method can improve 9.0% in onset align rate and 17.5% in offset align rate at most compared with the state-of-the-art methods.