Infant Cry Classification Based-On Feature Fusion and Mel-Spectrogram Decomposition with CNNs

Published: 01 Jan 2022, Last Modified: 13 Nov 2024AIMS 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We propose a novel method of using feature fusion and model fusion to improve infant cry classification performance. Spectrogram features extracted from transfer learning convolutional neural network model and mel-spectrogram features extracted from mel-spectrogram decomposition model are fused and fed into a multiple layer perception for better classification accuracy. The mel-spectrogram decomposition method feeds band-wise crops of the mel-spectrograms into multiple CNNs followed by a merged global classifier to capture more enhanced discriminative features. Feature fusion brings higher dimensional detailed information and characteristics more in line with human hearing perception together to achieve better performance on CNNs. The evaluation of the approach is conducted on Baby Chillanto database and Baby2020 database. Our approach yields a significant reduction of 4.72% absolute classification error rate compared with the result using single mel-spectrogram images with CNN model on Baby Chillanto database and our testing accuracy reaches 99.26%, which outperforms all other methods with this five-category classification task. The gender classification experiment on Baby2020 database also shows 3.87% accuracy improvement compared with the CNN model using single spectrograms.
Loading