Are Vision Transformers Always More Robust Than Convolutional Neural Networks?Download PDF

09 Oct 2021, 14:49 (modified: 01 Dec 2021, 23:38)NeurIPS 2021 Workshop DistShift PosterReaders: Everyone
Keywords: visual transformers, big transfer, transfer learning, data-shift, out-of-distribution detection, calibration
TL;DR: Evidence about the sufficiency of convolutional inductive biases for robustness to data-shift and good OOD detection performance. Comparing fine-tuning effects on uncertainty properties and robustness.
Abstract: Since Transformer architectures have been popularised in Computer Vision, several papers started analysing their properties in terms of calibration, out-of-distribution detection and data-shift robustness. Most of these papers conclude that Transformers, due to some intrinsic properties (presumably the lack of restrictive inductive biases and the computationally intensive self-attention mechanism), outperform Convolutional Neural Networks (CNNs). In this paper we question this conclusion: in some relevant cases, CNNs, with a pre-training and fine-tuning procedure similar to the one used for transformers, exhibit competitive robustness. To fully understand this behaviour, our evidence suggests that researchers should focus on the interaction between pre-training, fine-tuning and the considered architectures rather than on intrinsic properties of Transformers. For this reason, we present some preliminary analyses that shed some light on the impact of pre-training and fine-tuning on out-of-distribution detection and data-shift.
1 Reply