- Abstract: The machine learning and computer vision community is witnessing an unprecedented rate of new tasks being proposed and addressed, thanks to the power of deep convolutional networks to find complex mappings from X to Y. The advent of each task often accompanies the release of a large-scale human-labeled dataset, for supervised training of the deep network. However, it is expensive and time-consuming to manually label sufficient amount of training data. Therefore, it is important to develop algorithms that can leverage off-the-shelf labeled dataset to learn useful knowledge for the target task. While previous works mostly focus on transfer learning from a single source, we study multi-source transfer across domains and tasks (MS-DTT), in a semi-supervised setting. We propose GradMix, a model-agnostic method applicable to any model trained with gradient-based learning rule. GradMix transfers knowledge via gradient descent, by weighting and mixing the gradients from all sources during training. Our method follows a meta-learning objective, by assigning layer-wise weights to the source gradients, such that the combined gradient follows the direction that can minimize the loss for a small set of samples from the target dataset. In addition, we propose to adaptively adjust the learning rate for each mini-batch based on its importance to the target task, and a pseudo-labeling method to leverage the unlabeled samples in the target domain. We perform experiments on two MS-DTT tasks: digit recognition and action recognition, and demonstrate the advantageous performance of the proposed method against multiple baselines.
- Keywords: Transfer Learning, Domain Adaptation, Multi-source Learning
- TL;DR: We propose a gradient-based method to transfer knowledge from multiple sources across different domains and tasks.