Deep Recurrent Neural Network Layers with Layerwise LossDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Desk Rejected SubmissionReaders: Everyone
Abstract: Deep learning techniques have brought significant performance improvement to various areas of machine learning. Especially in the computer vision area, very deep networks such as ResNet have shown notable performance improvement. However, in speech recognition or language processing, such kinds of a very deep network have not been extensively employed. In this paper, we propose a very deep LSTM structure and their training strategy. In our training strategy, we first start training a conventional model with several LSTM layers. One notable difference is that for the top LSTM layer of the initial model, the Connectionist Temporal Classification (CTC) loss is applied both to the input and output of this top LSTM layer. Once this initial model is sufficiently layered, this top layer is copied to construct a very deep LSTM stack. For this newly constructed stack, the CTC loss is applied to every output of the LSTM layer as well as the top of the stack. Experimental results show that this deep LSTM structure shows significantly better results than the conventional model with 5 ~ 6 layers with a comparable number of parameters.
One-sentence Summary: In this paper, we propose a very deep neural network with layerwise loss
1 Reply

Loading