Subtraction Gates: Another Way to Learn Long-Term Dependencies in Recurrent Neural Networks

Tao He, Hua Mao, Zhang Yi

Published: 01 Jan 2022, Last Modified: 15 May 2023IEEE Trans. Neural Networks Learn. Syst. 2022Readers: Everyone

Abstract: Recurrent neural networks (RNNs) can remember temporal contextual information over various time steps. The well-known gradient vanishing/explosion problem restricts the ability of RNNs to learn long-term dependencies. The gate mechanism is a well-developed method for learning long-term dependencies in long short-term memory (LSTM) models and their variants. These models usually take the multiplication terms as gates to control the input and output of RNNs during forwarding computation and to ensure a constant error flow during training. In this article, we propose the use of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">subtraction</i> terms as another type of gates to learn long-term dependencies. Specifically, the multiplication gates are replaced by <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">subtraction</i> gates, and the activations of RNNs input and output are directly controlled by subtracting the subtrahend terms. The error flows remain constant, as the linear identity connection is retained during training. The proposed <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">subtraction</i> gates have more flexible options of internal activation functions than the multiplication gates of LSTM. The experimental results using the proposed Subtraction RNN (SRNN) indicate comparable performances to LSTM and gated recurrent unit in the Embedded Reber Grammar, Penn Tree Bank, and Pixel-by-Pixel MNIST experiments. To achieve these results, the SRNN requires approximate three-quarters of the parameters used by LSTM. We also show that a hybrid model combining multiplication forget gates and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">subtraction</i> gates could achieve good performance.

0 Replies