Non-Parametric Domain Adaptation Layer

TMLR Paper14 Authors

30 Mar 2022 (modified: 28 Feb 2023)Rejected by TMLREveryoneRevisionsBibTeX
Abstract: Normalization methods spurred the development of increasingly deep and efficient architectures as they reduce the distributions change during optimization, allowing for efficient training. However, most normalization methods cannot account for test-time distribution changes, increasing the vulnerability of the network concerning noise and input corruptions. As noise is ubiquitous and diverse in many applications, machine learning systems often fail drastically as they cannot cope with mismatches between training- and test-time activation distributions. The most common normalization method, batch normalization, is agnostic to changes in the input distribution during test time. This makes batch normalization prone to performance degradation whenever noise is present during test-time. Parametric correction schemes can only adjust for linear transformations of the activation distribution but not for changes in the distribution shape; this makes the network vulnerable to distribution changes that cannot be reflected in the normalization parameters. We propose an unsupervised non-parametric distribution correction layer that adapts the activation distribution and reduces the mismatch between the training and test-time distribution by minimizing the Wasserstein distance of each layer. We empirically show that the proposed method effectively improves the classification performance without the need for retraining or fine-tuning the model; on ImageNet-C it achieves up to 11 % improvement in Top-1 accuracy.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Vincent_Dumoulin1
Submission Number: 14
Loading