- Keywords: meta-learning, memorization, regularization, overfitting, mutually-exclusive
- TL;DR: We identify and formalize the memorization problem in meta-learning and solve this problem with novel meta-regularization method, which greatly expand the domain that meta-learning can be applicable to and effective on.
- Abstract: The ability to learn new concepts with small amounts of data is a critical aspect of intelligence that has proven challenging for deep learning methods. Meta-learning has emerged as a promising technique for leveraging data from previous tasks to enable efficient learning of new tasks. However, most meta-learning algorithms implicitly require that the meta-training tasks be "mutually-exclusive", such that no single model can solve all of the tasks at once. For example when creating tasks for a meta-learned N-way image classifier, we typically randomize the assignment of the image classes to N-way classification labels for each task. If this is not done, the meta-learner can ignore the task training data and learn a single model that performs all of the meta-training tasks zero-shot, but does not adapt effectively to new image classes. This requirement means that the user must take great care in designing the tasks, for example by shuffling labels or removing task identifying information from the inputs. In some domains, this makes meta-learning entirely inapplicable. In this paper, we address this challenge by designing an information-theoretic meta-regularization objective that places precedence on data-driven adaptation. This causes the meta-learner to decide what should be learned from data and what must be inferred from the input. By doing so, our algorithm can successfully use data from "non-mutually-exclusive" tasks to efficiently adapt to novel tasks. We demonstrate its applicability to both contextual and gradient-based meta-learning algorithms, and apply it in practical settings where standard meta-learning has been difficult to apply. Our approach substantially outperforms standard meta-learning algorithms in these settings.