Abstract: Characterization of the representations learned in intermediate layers of deep networks can provide valuable insight into the nature of a task and can guide the development of well-tailored learning strategies. Here we study convolutional neural network-based acoustic models in the context of automatic speech recognition. Adapting a method proposed by Yosinski et al. , we measure the transferability of each layer between German and English to assess the their language-specifity. We observe three distinct regions of transferability: (1) the first two layers are entirely transferable between languages, (2) layers 2–8 are also highly transferable but we find evidence of some language specificity, (3) the subsequent fully connected layers are more language specific but can be successfully finetuned to the target language. To further probe the effect of weight freezing, we performed follow-up experiments using freeze-training [Raghu et al., 2017]. Our results are consistent with the observation that CCNs converge 'bottom up' during training and demonstrate the benefit of freeze training, especially for transfer learning.
TL;DR: All but the first two layers of our CNNs based acoustic models demonstrated some degree of language-specificity but freeze training enabled successful transfer between languages.
Keywords: CNNs, acoustic modeling, interpretability, transfer learning, language-specificity, freeze training