Statistically and Computationally Efficient Linear Meta-representation LearningDownload PDF

21 May 2021, 20:47 (modified: 26 Oct 2021, 14:49)NeurIPS 2021 PosterReaders: Everyone
Keywords: meta-learning, alternating minimization, few-shot learning, representation learning, spectral methods, matrix factorization
TL;DR: We analyze practical meta-representation learning algorithm and prove that they can reduce the number of samples needed to learn many related tasks
Abstract: In typical few-shot learning, each task is not equipped with enough data to be learned in isolation. To cope with such data scarcity, meta-representation learning methods train across many related tasks to find a shared (lower-dimensional) representation of the data where all tasks can be solved accurately. It is hypothesized that any new arriving tasks can be rapidly trained on this low-dimensional representation using only a few samples. Despite the practical successes of this approach, its statistical and computational properties are less understood. Moreover, the prescribed algorithms in these studies have little resemblance to those used in practice or they are computationally intractable. To understand and explain the success of popular meta-representation learning approaches such as ANIL, MetaOptNet, R2D2, and OML, we study a alternating gradient-descent minimization (AltMinGD) method (and its variant alternating minimization (AltMin)) which underlies the aforementioned methods. For a simple but canonical setting of shared linear representations, we show that AltMinGD achieves nearly-optimal estimation error, requiring only $\Omega(\mathrm{polylog}\,d)$ samples per task. This agrees with the observed efficacy of this algorithm in the practical few-shot learning scenarios.
Supplementary Material: pdf
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
Code: zip
11 Replies