Studying BatchNorm Learning Rate Decay on Meta-Learning Inner-Loop Adaptation

Alexander Wang; Sasha Doubov; Gary Leung

Studying BatchNorm Learning Rate Decay on Meta-Learning Inner-Loop Adaptation

Alexander Wang, Sasha Doubov, Gary Leung

Published: 10 Dec 2021, Last Modified: 05 May 2023NeurIPS 2021 Workshop MetaLearn PosterReaders: Everyone

Keywords: meta-learning, few-shot classification, batch normalization

TL;DR: Counteracting batch normalization implicit learning rate decay increases inner-loop adaptation of meta-learning models.

Abstract: Meta-learning for few-shot classification has been challenged on its effectiveness compared to simpler pretraining methods and the validity of its claim of "learning to learn". Recent work has suggested that MAML-based models do not perform "rapid-learning" in the inner-loop but reuse features by only adapting the final linear layer. Separately, BatchNorm, a near ubiquitous inclusion in model architectures, has been shown to have an implicit learning rate decay effect on the preceding layers of a network. We study the impact of BatchNorm's implicit learning rate decay on feature reuse in meta-learning methods and find that counteracting it increases change in intermediate layers during adaptation. We also find that counteracting this learning rate decay sometimes improves performance on few-shot classification tasks.

Contribution Process Agreement: Yes

Author Revision Details: * We considered and made references to related papers suggested by Reviewer WLRm. * Clarified why we believe rapid learning is preferable over feature re-use in meta-learning models. * Re-ran hyperparameter tuning, allowing us to draw stronger conclusions in our discussion of results.

Process Comment: --

Poster Session Selection: Poster session #3 (16:50 UTC)

0 Replies

Loading