Keywords: meta-learning, few-shot classification, batch normalization
TL;DR: Counteracting batch normalization implicit learning rate decay increases inner-loop adaptation of meta-learning models.
Abstract: Meta-learning for few-shot classification has been challenged on its effectiveness compared to simpler pretraining methods and the validity of its claim of "learning to learn". Recent work has suggested that MAML-based models do not perform "rapid-learning" in the inner-loop but reuse features by only adapting the final linear layer. Separately, BatchNorm, a near ubiquitous inclusion in model architectures, has been shown to have an implicit learning rate decay effect on the preceding layers of a network. We study the impact of BatchNorm's implicit learning rate decay on feature reuse in meta-learning methods and find that counteracting it increases change in intermediate layers during adaptation. We also find that counteracting this learning rate decay sometimes improves performance on few-shot classification tasks.
Contribution Process Agreement: Yes
Author Revision Details: * We considered and made references to related papers suggested by Reviewer WLRm. * Clarified why we believe rapid learning is preferable over feature re-use in meta-learning models. * Re-ran hyperparameter tuning, allowing us to draw stronger conclusions in our discussion of results.
Process Comment: --
Poster Session Selection: Poster session #3 (16:50 UTC)