Enhancing respiratory sound detection through integrated learning and dynamic convolutional recurrent neural networks

Chenyang Xu, Yuanbing Ouyang, Hao Wang

Published: 23 Oct 2024, Last Modified: 13 Nov 2025XianEveryoneCC BY 4.0

Abstract: This study introduces an integrated learning approach for the classification of breath sounds and event detection, leveraging Mel-scale frequency cepstrum coefficients (MFCC) to enhance the accuracy and efficiency of sound event detection. The method employs MFCC to characterize the filtered sound signal, alongside dynamic frequency convolution and Dynamic Triple Center Loss (DTCL) for feature extraction and classification. Experiments conducted on the SPRSound database demonstrate the efficacy of MFCC feature extraction in enhancing model performance. Specifically, the Dynamic Convolutional Recurrent Neural Network (DCRNN) achieves the highest F score and the lowest Equivalent Error Rate (ER) when utilizing the triple center loss method. These findings underscore the substantial improvement in breath sound detection facilitated by the proposed method, thereby paving the way for advancements in audio signal processing and sound event detection.