Keywords: Temporal Convolutional Network, Sequence Modeling, Deep Learning
TL;DR: Convolutional networks should be considered a natural starting point for sequence modeling tasks.
Abstract: Although both convolutional and recurrent architectures have a long history in sequence prediction, the current "default" mindset in much of the deep learning community is that generic sequence modeling is best handled using recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine translation. Given a new sequence modeling task or dataset, which architecture should a practitioner use? We conduct a systematic evaluation of generic convolutional and recurrent architectures for sequence modeling. In particular, the models are evaluated across a broad range of standard tasks that are commonly used to benchmark recurrent networks. Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory. We further show that thepotential "infinite memory" advantage that RNNs have over TCNs is largely absent in practice: TCNs indeed exhibit longer effective history sizes than their recurrent counterparts. As a whole, we argue that it may be time to (re)consider ConvNets as the default ``go to'' architecture for sequence modeling.
Data: [LAMBADA](https://paperswithcode.com/dataset/lambada), [MNIST](https://paperswithcode.com/dataset/mnist), [Penn Treebank](https://paperswithcode.com/dataset/penn-treebank), [WikiText-103](https://paperswithcode.com/dataset/wikitext-103), [WikiText-2](https://paperswithcode.com/dataset/wikitext-2)