Abstract: Residual and skip connections play an important role in many current generative models. Although their theoretical and numerical advantages are understood, their role in speech enhancement systems has not been investigated so far. When performing spectral speech enhancement, residual connections are very similar in nature to spectral subtraction, which is the one of the most commonly employed speech enhancement approaches. Highway networks, on the other hand, can be seen as a combination of spectral masking and spectral subtraction. However, when using deep neural networks, such operations would normally happen in a transformed spectral domain, as opposed to traditional speech enhancement where all operations are often done directly on the spectrum. In this paper, we aim to investigate the role of residual and highway connections in deep neural networks for speech enhancement, and verify whether or not they operate similarly to their traditional, digital signal processing counterparts. We visualize the outputs of such connections, projected back to the spectral domain, in models trained for speech denoising, and show that while skip connections do not necessarily improve performance with regards to the number of parameters, they make speech enhancement models more interpretable.
TL;DR: We show how using skip connections can make speech enhancement models more interpretable, as it makes them use similar mechanisms that have been explored in the DSP literature.
Keywords: speech enhancement, residual networks, highway networks, skip connections, interpretability