- Abstract: The Tensor-Train factorization (TTF) is an efficient way to compress large weight matrices of fully-connected layers and recurrent layers in recurrent neural networks (RNNs). However, high Tensor-Train ranks for all the core tensors of parameters need to be element-wise fixed, which results in an unnecessary redundancy of model parameters. This work applies Riemannian stochastic gradient descent (RSGD) to train core tensors of parameters in the Riemannian Manifold before finding vectors of lower Tensor-Train ranks for parameters. The paper first presents the RSGD algorithm with a convergence analysis and then tests it on more advanced Tensor-Train RNNs such as bi-directional GRU/LSTM and Encoder-Decoder RNNs with a Tensor-Train attention model. The experiments on digit recognition and machine translation tasks suggest the effectiveness of the RSGD algorithm for Tensor-Train RNNs.
- Keywords: Riemannian Stochastic Gradient Descent, Tensor-Train, Recurrent Neural Networks
- TL;DR: Applying the Riemannian SGD (RSGD) algorithm for training Tensor-Train RNNs to further reduce model parameters.