Enforcing zero-Hessian in meta-learningDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: meta learning, Gradient based meta learning, GBML, kernel gradient descent, metric-based learning, optimization-based meta-learning
Abstract: Gradient-Based Meta Learning (GBML) enables us to get task-specific parameters with few-labeled datapoints in an inner loop. However, it has not yet been discussed how GBML can adapt to a new task within a few optimization steps with a huge learning rate in the inner loop. We find that the gradient does not change from the beginning to the end of the inner loop, meaning that it behaves like a linear model. In this paper, we argue that this characteristic is an essential key to understanding convergence in inner loops with huge learning rates. Also, we show that gradient-based meta learning can be interpreted as metric-based meta learning when we adopt our hypothesis that linearity in the inner loop is the key to operating GBML. To empirically prove and exploit our hypothesis, we propose a regularization-based algorithm called enforcing Linearity in the Inner Loop (LIL) which exploits our observation which can be applied to any baselines that has the form of GBML. LIL proves its potential by showing its boosted performance not only on top of general baselines in various architectures, but also on adverse or Hessian-free baselines. Qualitative experiments are also conducted to explain the performance of LIL.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
TL;DR: This paper argues linearity in the inner loop is the key gradient-based meta learning, thereby suggests algorithms which exploits this prior.
Supplementary Material: zip
6 Replies

Loading