Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Large Scale Multi-Domain Multi-Task Learning with MultiModel
Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit
Feb 15, 2018 (modified: Feb 15, 2018)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:Deep learning yields great results across many fields,
from speech recognition, image classification, to translation.
But for each problem, getting a deep model to work well involves
research into the architecture and a long period of tuning.
We present a single model that yields good results on a number
of problems spanning multiple domains. In particular, this single model
is trained concurrently on ImageNet, multiple translation tasks,
image captioning (COCO dataset), a speech recognition corpus,
and an English parsing task.
Our model architecture incorporates building blocks from multiple
domains. It contains convolutional layers, an attention mechanism,
and sparsely-gated layers.
Each of these computational blocks is crucial for a subset of
the tasks we train on. Interestingly, even if a block is not
crucial for a task, we observe that adding it never hurts performance
and in most cases improves it on all tasks.
We also show that tasks with less data benefit largely from joint
training with other tasks, while performance on large tasks degrades
only slightly if at all.
TL;DR:Large scale multi-task architecture solves ImageNet and translation together and shows transfer learning.
Keywords:multi-task learning, transfer learning
Enter your feedback below and we'll get back to you as soon as possible.