Keywords: Data Augmentation, Distribution Shift, Multi-Task Learning
Abstract: Multi-Task Learning describes training on multiple tasks simultaneously to leverage the shared information between tasks. Tasks are typically defined as alternative ways to label data. Given an image of a face, a model could either classify the presence of sunglasses, or the presence of facial hair. This example highlights how the same input image can be posed as two separate binary classification problems. We present Multi-Task Distribution Learning, highlighting the similarities between Multi-Task Learning and preparing for Distribution Shift. Even with rapid advances in large-scale models, a Multi-Task Learner that is trained with object detection will outperform zero-shot inference on object detection. Similarly, we show how training with a data distribution aids with performance on that data distribution. We begin our experiments with a pairing of distribution tasks. We then show that this scales to optimizing 10 distribution tasks simultaneously. We further perform a task grouping analysis to see which augmentations train well together and which do not. Multi-Task Distribution Learning highlights the similarities between Distribution Shift and Zero-Shot task inference. These experiments will continue to improve with advances in generative modeling that enables simulating more interesting distribution shifts outside of standard augmentations. In addition, we discuss how the WILDS benchmark of Domain Generalizations and Subpopulation Shifts will aid in future work. Utilizing the prior knowledge of data augmentation and understanding multi-task interference is a promising direction to understand the phenomenon of Distribution Shift. To facilitate reproduction, we are open-sourcing code, leaderboards, and experimental data upon publication.
One-sentence Summary: Multi-Task Learning of different Data Distributions, simulated with Data Augmentation.
6 Replies
Loading