Abstract: Residual and skip connections play an important role in many current
generative models. Although their theoretical and numerical advantages
are understood, their role in speech enhancement systems has not been
investigated so far. When performing spectral speech enhancement,
residual connections are very similar in nature to spectral subtraction,
which is the one of the most commonly employed speech enhancement approaches.
Highway networks, on the other hand, can be seen as a combination of spectral
masking and spectral subtraction. However, when using deep neural networks, such operations would
normally happen in a transformed spectral domain, as opposed to traditional speech
enhancement where all operations are often done directly on the spectrum.
In this paper, we aim to investigate the role of residual and highway
connections in deep neural networks for speech enhancement, and verify whether
or not they operate similarly to their traditional, digital signal processing
counterparts. We visualize the outputs of such connections, projected back to
the spectral domain, in models trained for speech denoising, and show that while
skip connections do not necessarily improve performance with regards to the
number of parameters, they make speech enhancement models more interpretable.
TL;DR: We show how using skip connections can make speech enhancement models more interpretable, as it makes them use similar mechanisms that have been explored in the DSP literature.
Keywords: speech enhancement, residual networks, highway networks, skip connections, interpretability
6 Replies
Loading