TL;DR: We analyzed the role of two learning rates in model-agnostic meta-learning in convergence.
Abstract: Model-agnostic meta-learning (MAML) is known as a powerful meta-learning method. However, MAML is notorious for being hard to train because of the existence of two learning rates. Therefore, in this paper, we derive the conditions that inner learning rate $\alpha$ and meta-learning rate $\beta$ must satisfy for MAML to converge to minima with some simplifications. We find that the upper bound of $\beta$ depends on $ \alpha$, in contrast to the case of using the normal gradient descent method. Moreover, we show that the threshold of $\beta$ increases as $\alpha$ approaches its own upper bound. This result is verified by experiments on various few-shot tasks and architectures; specifically, we perform sinusoid regression and classification of Omniglot and MiniImagenet datasets with a multilayer perceptron and a convolutional neural network. Based on this outcome, we present a guideline for determining the learning rates: first, search for the largest possible $\alpha$; next, tune $\beta$ based on the chosen value of $\alpha$.
Code: https://drive.google.com/file/d/1Seej9xI03F7_2wh4deDTBk_4aFyb2otb/view?usp=sharing
Keywords: meta-learning, convergence
Original Pdf: pdf
12 Replies
Loading