Activity Detection for Sound Events in Orchestral Music Recordings (Aktivitätserkennung von Klangereignissen in Orchestermusikaufnahmen)

Michael Krause

Activity Detection for Sound Events in Orchestral Music Recordings (Aktivitätserkennung von Klangereignissen in Orchestermusikaufnahmen)

Michael Krause

Published: 01 Jan 2023, Last Modified: 15 Aug 2024undefined 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Composers of music can express emotions and communicate with their audience in a multitude of ways. They decide on which voices or instruments to use, arrange notes into melodies, and develop recurring musical patterns. When a composition is performed and turned into sound, their decisions are realized acoustically as sound events. Despite being easily understood by human listeners, teaching a machine to perceive and process such musical sound events can be a challenging task. This thesis studies computational techniques for detecting the activity of sound events in a music recording, i. e., identifying the exact moments in time when a certain event occurs. We focus on orchestral and opera music, which are rarely considered in music processing research and particularly complex due to their high degree of polyphony. In this context, we cover four different types of musical sound events, namely singing, instrumental sounds, different pitches, and leitmotifs (special kinds of musical patterns used for storytelling in opera). To detect the activity of these events within a recording, we design, implement, and evaluate deep learning systems. In addition, we explore a range of techniques including hierarchical classification, differentiable sequence alignments, and representation learning. Beyond evaluating the accuracy of our detection systems, we aim at a deeper understanding of our models with regard to their robustness and sensitivity to confounding effects. The main contributions of this thesis can be summarized as follows: First, we investigate signal processing and deep learning methods for detecting singing activity in opera recordings. Second, we extend this scenario towards simultaneously detecting singer gender and voice type. We compare several techniques for utilizing the hierarchical relationships between these classes and propose a novel loss formulation for ensuring consistency of detection results across different hierarchy levels. Third, we apply such a hierarchical technique to instrument activity detection. For this task, research progress is often limited by the cost of obtaining manually annotated audio examples for training. To address this issue, we demonstrate that hierarchical information reduces the need for fine-grained instrument annotations during training of our detection models. Fourth, we show how the structure of certain orchestral music datasets can be exploited to learn representations related to instrumentation, without requiring any instrument annotations at all. Fifth, we consider the problem of detecting pitch activity and show how differentiable sequence alignments can be used for learning from weak annotations. Finally, we perform classification and detection of leitmotifs. We present deep learning systems that successfully detect leitmotif activity and provide a detailed analysis of their generalization ability.

Loading