Parallelizable Simple Recurrent Units with Hierarchical Memory

Published: 2023, Last Modified: 22 Jan 2026ICONIP (15) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recurrent neural networks and its many variants have been widely used in language modeling, text generation, machine translation, speech recognition and so forth, due to the excellent ability to process sequence data. However, the above-mentioned networks are constructed in a multi-layer stacking way, which makes the memory-dependent information in the distant past continuously decay. To this end, this paper proposes a parallelizable simple recurrent unit with hierarchical memory (PSRU-HM) to preserve more long-term historical information for inference. It is achieved by the nested SRU structure, and realizes the information interaction between inner and outer memory cell through the connection between inner and outer layers. The depth of network can be dynamically adjusted according to the task complexity. Meanwhile, a jump connection that combines high-level and low-level features is added to the outermost layer. It maximizes the utilization of effective input information. In order to accelerate the training and inference of the network, the weights of PSRU-HM are reorganized to enable the parallelization deployment in the CUDA framework. Extensive experiments are conducted to verify the proposed method using several public datasets, including text classification, language modeling and question answering. Experimental results show that PSRU-HM outperforms the traditional methods and achieves 2\(\times \) speed-up compared to cuDNN-optimized LSTM.
Loading