First-order ANIL provably learns representations despite overparametrisation

Published: 07 Nov 2023, Last Modified: 13 Dec 2023M3L 2023 PosterEveryoneRevisionsBibTeX
Keywords: transfer learning, meta learning, and lifelong learning
TL;DR: In the limit of an infinite number of tasks, first-order ANIL with a linear two-layer network architecture provably learns linear shared representations despite having a width larger than the dimension of the shared representations.
Abstract: Meta-learning methods leverage data from previous tasks to learn a new task in a sample-efficient manner. In particular, model-agnostic methods look for initialisation points from which gradient descent quickly adapts to any new task. Although it has been empirically suggested that such methods learns shared representations during pretraining, there is limited theoretical evidence of such behavior. In this direction, this work shows, in the limit of infinite tasks, first-order ANIL with a linear two-layer network successfully learns linear shared representations. This result even holds under _overparametrisation_; having a width larger than the dimension of the shared representations results in an asymptotically low-rank solution.
Submission Number: 17
Loading