Towards extremely compact rnns for video recognition with fully decomposed hierarchical tucker structure
Abstract: Recurrent Neural Networks (RNNs) have been widely
used in sequence analysis and modeling. However, when
processing high-dimensional data, RNNs typically require
very large model sizes, thereby bringing a series of deployment challenges. Although various prior works have been
proposed to reduce the RNN model sizes, executing RNN
models in the resource-restricted environments is still a very
challenging problem. In this paper, we propose to develop
extremely compact RNN models with fully decomposed hierarchical Tucker (FDHT) structure. The HT decomposition does not only provide much higher storage cost reduction than the other tensor decomposition approaches, but
also brings better accuracy performance improvement for
the compact RNN models. Meanwhile, unlike the existing
tensor decomposition-based methods that can only decompose the input-to-hidden layer of RNNs, our proposed fully
decomposition approach enables the comprehensive compression for the entire RNN models with maintaining very
high accuracy. Our experimental results on several popular
video recognition datasets show that, our proposed fully decomposed hierarchical tucker-based LSTM (FDHT-LSTM)
is extremely compact and highly efficient. To the best of
our knowledge, FDHT-LSTM, for the first time, consistently
achieves very high accuracy with only few thousand parameters (3,132 to 8,808) on different datasets. Compared
with the state-of-the-art compressed RNN models, such as
TT-LSTM, TR-LSTM and BT-LSTM, our FDHT-LSTM simultaneously enjoys both order-of-magnitude (3,985ˆ to
10,711ˆ) fewer parameters and significant accuracy improvement (0.6% to 12.7%).
0 Replies
Loading