Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning

Moslem Yazdanpanah, Aamer Abdul Rahman, Muawiz Chaudhary, Christian Desrosiers, Mohammad Havaei, Eugene Belilovsky, Samira Ebrahimi Kahou

2022 (modified: 16 Nov 2022)CVPR 2022Readers: Everyone

Abstract: Batch normalization is a staple of computer vision models, including those employed in few-shot learning. Batch nor-malization layers in convolutional neural networks are composed of a normalization step, followed by a shift and scale of these normalized features applied via the per-channel trainable affine parameters <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\gamma$</tex> and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\beta$</tex> . These affine param-eters were introduced to maintain the expressive powers of the model following normalization. While this hypothesis holds true for classification within the same domain, this work illustrates that these parameters are detrimen-tal to downstream performance on common few-shot trans-fer tasks. This effect is studied with multiple methods on well-known benchmarks such as few-shot classification on minilmageNet, cross-domain few-shot learning (CD-FSL) and META-DATASET. Experiments reveal consistent performance improvements on CNNs with affine unaccompanied batch normalization layers; particularly in large domain-shift few-shot transfer settings. As opposed to common practices in few-shot transfer learning where the affine pa-rameters are fixed during the adaptation phase, we show fine-tuning them can lead to improved performance.

0 Replies