Role of two learning rates in convergence of model-agnostic meta-learning

Shiro Takagi; Yoshihiro Nagano; Yuki Yoshida; Masato Okada

Role of two learning rates in convergence of model-agnostic meta-learning

Shiro Takagi, Yoshihiro Nagano, Yuki Yoshida, Masato Okada

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: We analyzed the role of two learning rates in model-agnostic meta-learning in convergence.

Abstract: Model-agnostic meta-learning (MAML) is known as a powerful meta-learning method. However, MAML is notorious for being hard to train because of the existence of two learning rates. Therefore, in this paper, we derive the conditions that inner learning rate $\alpha$ and meta-learning rate $\beta$ must satisfy for MAML to converge to minima with some simplifications. We find that the upper bound of $\beta$ depends on $ \alpha$, in contrast to the case of using the normal gradient descent method. Moreover, we show that the threshold of $\beta$ increases as $\alpha$ approaches its own upper bound. This result is verified by experiments on various few-shot tasks and architectures; specifically, we perform sinusoid regression and classification of Omniglot and MiniImagenet datasets with a multilayer perceptron and a convolutional neural network. Based on this outcome, we present a guideline for determining the learning rates: first, search for the largest possible $\alpha$; next, tune $\beta$ based on the chosen value of $\alpha$.

Code: https://drive.google.com/file/d/1Seej9xI03F7_2wh4deDTBk_4aFyb2otb/view?usp=sharing

Keywords: meta-learning, convergence

Original Pdf: pdf

12 Replies

Loading