Keywords: Temporal Convolutional Network, Sequence Modeling, Deep Learning
TL;DR: Convolutional networks should be considered a natural starting point for sequence modeling tasks.
Abstract: Although both convolutional and recurrent architectures have a
long history in sequence prediction, the current "default" mindset in much of
the deep learning community is that generic sequence modeling is best handled
using recurrent networks. Yet recent results indicate that convolutional architectures
can outperform recurrent networks on tasks such as audio synthesis and machine
translation. Given a new sequence modeling task or dataset, which architecture
should a practitioner use? We conduct a systematic evaluation of generic
convolutional and recurrent architectures for sequence modeling.
In particular, the models are evaluated across a broad range of standard tasks that are
commonly used to benchmark recurrent networks. Our results indicate that a
simple convolutional architecture outperforms canonical recurrent networks
such as LSTMs across a diverse range of tasks and datasets, while demonstrating
longer effective memory. We further show that thepotential "infinite memory" advantage
that RNNs have over TCNs is largely absent in practice: TCNs indeed exhibit longer
effective history sizes than their recurrent counterparts. As a whole, we argue that
it may be time to (re)consider ConvNets as the default ``go to'' architecture for sequence
modeling.
Data: [LAMBADA](https://paperswithcode.com/dataset/lambada), [MNIST](https://paperswithcode.com/dataset/mnist), [Penn Treebank](https://paperswithcode.com/dataset/penn-treebank), [WikiText-103](https://paperswithcode.com/dataset/wikitext-103), [WikiText-2](https://paperswithcode.com/dataset/wikitext-2)
1 Reply
Loading