Abstract: We show that some standard attention-based architectures widely used in Neural Machine
Translation as well as a pointer-based variant achieve results on some of the compositional
SCAN tasks that are far superior to those reported in Lake & Baroni (2018). We next show that
there is high variance in the test accuracy across both random initialization and training
duration. We show that ensembling can be used to take advantage of this variance and improve
results but that, for many tasks, a large gap remains between ensemble performance and the
performance of an oracularly selected single best model. Based on these insights, we suggest
some possible directions for future research, emphasizing selection and regularization over the
need for more compositional architectures.
TL;DR: We show NMT models do better than claimed on the SCAN tasks, but the high variance will require new techniques.
Keywords: sequence-to-sequence recurrent networks, pointer networks, compositionality, systematicity, generalization
5 Replies
Loading