Shift and Scale is Detrimental To Few-Shot Transfer

Moslem Yazdanpanah; Aamer Abdul Rahman; Christian Desrosiers; Mohammad Havaei; Eugene Belilovsky; Samira Ebrahimi Kahou

Shift and Scale is Detrimental To Few-Shot Transfer

Moslem Yazdanpanah, Aamer Abdul Rahman, Christian Desrosiers, Mohammad Havaei, Eugene Belilovsky, Samira Ebrahimi Kahou

Published: 02 Dec 2021, Last Modified: 05 May 2023NeurIPS 2021 Workshop DistShift PosterReaders: Everyone

Keywords: Few Shot Learning, Normalization, Cross Domain, Domain Shift, Batch Normalization.

TL;DR: We demonstrate that removing the affine parameters of batchnorm yields large gains when transfering to distant few shot learning tasks

Abstract: Batch normalization is a common component in computer vision models, including ones typically used for few-shot learning. Batch normalization applied in convolutional networks consists of a normalization step, followed by the application of per-channel trainable affine parameters which shift and scale the normalized features. The use of these affine parameters can speed up model convergence on a source task. However, we demonstrate in this work that, on common few-shot learning benchmarks, training a model on a source task using these affine parameters is detrimental to downstream transfer performance. We study this effect for several methods on well-known benchmarks such as cross-domain few-shot learning (CD-FSL) benchmark and few-shot image classification on miniImageNet. We find consistent performance gains, particularly in settings with more distant transfer tasks. Improvements from applying this low-cost and easy-to-implement modifications are shown to rival gains obtained by more sophisticated and costly methods.

1 Reply

Loading