Abstract: Training deep neural networks is often challenging in terms
of training stability. It often requires careful hyperparameter
tuning or a pretraining scheme to converge. Layer normalization (LN) has shown to be a crucial ingredient in training deep
encoder-decoder models. We explore various LN long short-term memory (LSTM) recurrent neural networks (RNN) variants by applying LN to different parts of the internal recurrency of LSTMs. There is no previous work that investigates
this. We carry out experiments on the Switchboard 300h task
for both hybrid and end-to-end ASR models and we show that
LN improves the final word error rate (WER), the stability
during training, allows to train even deeper models, requires
less hyperparameter tuning, and works well even without pretraining. We find that applying LN to both forward and recurrent inputs globally, which we denoted by Global Joined
Norm variant, gives a 10% relative improvement in WER.
0 Replies
Loading