The Curse of Low Task Diversity: On the Failure of Transfer Learning to Outperform MAML and Their Empirical EquivalenceDownload PDF

Published: 21 Oct 2022, Last Modified: 05 May 2023NeurIPS 2022 Workshop MetaLearn PosterReaders: Everyone
Keywords: meta-learning, diversity, transfer learning
TL;DR: when the task diversity of few-shot learning benchmarks is low and comparison is fair, MAML and transfer learning perform the same -- opposite of claims that transfer learning is better
Abstract: Recently, it has been observed that a transfer learning solution might be all we need to solve many few-shot learning benchmarks -- thus raising important questions about when and how meta-learning algorithms should be deployed. In this paper, we seek to clarify these questions by 1. proposing a novel metric -- the {\it diversity coefficient} -- to measure the diversity of tasks in a few-shot learning benchmark and 2. by comparing Model-Agnostic Meta-Learning (MAML) and transfer learning under fair conditions (same architecture, same optimizer, and all models trained to convergence). Using the diversity coefficient, we show that the popular MiniImageNet and CIFAR-FS few-shot learning benchmarks have low diversity. This novel insight contextualizes claims that transfer learning solutions are better than meta-learned solutions in the regime of low diversity under a fair comparison. Specifically, we empirically find that a low diversity coefficient correlates with a high similarity between transfer learning and MAML learned solutions in terms of accuracy at meta-test time and classification layer similarity (using feature based distance metrics like SVCCA, PWCCA, CKA, and OPD). To further support our claim, we find this meta-test accuracy holds even as the model size changes. Therefore, we conclude that in the low diversity regime, MAML and transfer learning have equivalent meta-test performance when both are compared fairly. We also hope our work inspires more thoughtful constructions and quantitative evaluations of meta-learning benchmarks in the future.
0 Replies