Weakly Labeled Sound Event Detection using Attention Mechanism with Teacher-Student Model

Published: 01 Jan 2023, Last Modified: 21 Apr 2024ISM 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Sound Event Detection (SED) enables identifying and categorizing sound events within audio signals. In this study, we investigate the role of the self and multi-head attention mechanism in enhancing SED performance with the teacher-student learning model in the knowledge distillation context. The study contributes to developing SED methodologies by focusing on detecting events and their temporal boundaries (onset and offset) in weakly labeled and unlabeled sounds. The attention mechanism allows the model to focus on different parts of the audio sequence based on the context, making it robust for tasks where temporal relationships and context matter. Specifically, we present extensive experiments using low- and high-level audio features, including Mel Frequency Cepstral Coefficients (MFCC), Log Mel-Spectrogram (log-Mel), Bidirectional Encoder representation from Audio Transformers (BEATs), Audio Spectrogram Transformer (AST), and Pretrained Audio Neural Networks (PANNs) to assess the performance of individual features with different attention mechanisms. We evaluate the attention mechanism and low-and high-level feature performances with the baseline teacher-student model of the Sound Event Detection with Weak Labels and Synthetic Soundscapes Challenge. Our experiments on the performance dataset show that the proposed attention-based model improves the F1 scores in all features.
Loading