Keywords: Meta-learning, Few-Shot Learning, Rademacher Complexity, Statistical Learning Theory
TL;DR: We provide a statistical analysis of fine-tuning-based meta-learning, and establish a sample complexity gap from a standard baseline.
Abstract: Representation learning has served as a key tool for meta-learning, enabling rapid learning of new tasks. Recent works like MAML learn task-specific representations by finding an initial representation requiring minimal per-task adaptation (i.e. a fine-tuning-based objective). We present a theoretical framework for analyzing a MAML-like algorithm, assuming all available tasks require approximately the same representation. We then provide risk bounds on predictors found by fine-tuning via gradient descent, demonstrating that the method provably leverages the shared structure. We illustrate these bounds in the logistic regression and neural network settings. In contrast, we establish settings where learning one representation for all tasks (i.e. using a "frozen representation" objective) fails. Notably, any such algorithm cannot outperform directly learning the target task with no other information, in the worst case. This separation underscores the benefit of fine-tuning-based over “frozen representation” objectives in few-shot learning.
Supplementary Material: zip
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.