Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Convolutional Sequence Modeling Revisited
Shaojie Bai, J. Zico Kolter, Vladlen Koltun
Feb 12, 2018 (modified: Feb 15, 2018)ICLR 2018 Workshop Submissionreaders: everyoneShow Bibtex
Abstract:Although both convolutional and recurrent architectures have a
long history in sequence prediction, the current "default" mindset in much of
the deep learning community is that generic sequence modeling is best handled
using recurrent networks. Yet recent results indicate that convolutional architectures
can outperform recurrent networks on tasks such as audio synthesis and machine
translation. Given a new sequence modeling task or dataset, which architecture
should a practitioner use? We conduct a systematic evaluation of generic
convolutional and recurrent architectures for sequence modeling.
In particular, the models are evaluated across a broad range of standard tasks that are
commonly used to benchmark recurrent networks. Our results indicate that a
simple convolutional architecture outperforms canonical recurrent networks
such as LSTMs across a diverse range of tasks and datasets, while demonstrating
longer effective memory. We further show that thepotential "infinite memory" advantage
that RNNs have over TCNs is largely absent in practice: TCNs indeed exhibit longer
effective history sizes than their recurrent counterparts. As a whole, we argue that
it may be time to (re)consider ConvNets as the default ``go to'' architecture for sequence
TL;DR:Convolutional networks should be considered a natural starting point for sequence modeling tasks.
Keywords:Temporal Convolutional Network, Sequence Modeling, Deep Learning
Enter your feedback below and we'll get back to you as soon as possible.