Fei Tian

Feb 15, 2018 (modified: Feb 15, 2018) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Deep neural networks based sequence-to-sequence learning has achieved remarkable progress in applications like machine translation and text summarization. However, sequence-to-sequence models suffer from severe inefficiency in training process, requiring huge amount of training time as well as memory usage. In this work, inspired by densely connected layers in modern convolutional neural network, we introduce densely connected sequence-to-sequence learning mechanism to tackle this challenge. In this mechanism, multiple layers of representations from stacked recurrent neural networks are concatenated to enhance feature reuse. Furthermore, a densely connected attention model is elaborately leveraged to improve information flow with more efficient parameter usage via multi-branch structure and local sparsity. We show that such a densely connected mechanism significantly reduces training time and memory usage for sequence-to-sequence learning. In particular, in WMT-14 English-French translation task with a subset of 12M training data, it takes half of training time and model parameters to achieve similar BLEU as typical stacked LSTM models.