Abstract: Adaptation of deep neural network (DNN) based language identification models is still a challenging area of research. Recently, state-of-the-art approaches to short duration language identification task have made use of bidirectional long short-term memory (BLSTM) recurrent neural network (RNN) language identification models. Although this enables the effective modelling of sequential information, significant mismatch due to different conditions such as speaker, channel, duration and background noise between training and testing data still exists. An adaptation of BLSTM systems can help to reduce such mismatches between training and testing data. In this paper, a transformation to the existing BLSTM layer is proposed, using learning of a second order factorization matrix called a compensation layer. The condition-dependent parameters of the factorization matrix are estimated to adapt the BLSTM layer weights. Experiments on the AP17-OLR database show that utterance level adaptation helps to achieve relative improvements of 28% in terms of Cavg over a traditional BLSTM for utterances of `1s' duration.
0 Replies
Loading