How do deep convolutional neural networks learn from raw audio waveforms?Download PDF

15 Feb 2018 (modified: 10 Feb 2022)ICLR 2018 Conference Blind SubmissionReaders: Everyone
Abstract: Prior work on speech and audio processing has demonstrated the ability to obtain excellent performance when learning directly from raw audio waveforms using convolutional neural networks (CNNs). However, the exact inner workings of a CNN remain unclear, which hinders further developments and improvements into this direction. In this paper, we theoretically analyze and explain how deep CNNs learn from raw audio waveforms and identify potential limitations of existing network structures. Based on this analysis, we further propose a new network architecture (called SimpleNet), which offers a very simple but concise structure and high model interpretability.
Keywords: Convolutional neural networks, Audio processing, Speech processing
Data: [IEMOCAP](https://paperswithcode.com/dataset/iemocap)
11 Replies

Loading