Generalization to translation shifts in object detection: a study in architectures and augmentations
Keywords: OOD generalization, beyond accuracy, empirical study
TL;DR: Data augmentations and architecture are complementary ways of incorporating inductive bias about desired robustness/invariances
Abstract: We provide a detailed evaluation of data augmentations and model architectures (convolutional, vision transformer, and fully connected MLP networks) on generalization to large translation shifts in image data. We make the following observations: (a) In the absence of data augmentation, all architectures, including convolutional networks suffer degradation in performance when evaluated on spatially translated test datasets. Understandably, both the in-distribution accuracy and degradation to shifts are significantly worse for non-convolutional architectures. (b) Across all architectures, even a minimal random crop augmentation (e.g., at most $4$ pixel in CIFAR and TINYIMAGENET datasets) improves the robustness of model performance to much larger magnitude shifts of up to $1/4$ of image size ($8$-$16$ pixels) in the test data -- suggesting a form of meta generalization from augmentation. For non-convolutional architectures, while the absolute accuracy is still low, we see dramatic improvements in relative robustness to large translation shifts. We further observe that the robustness gains are maintained with even more minimal $1-2$ pixel random crop augmentation. (c) With a sufficiently advanced augmentation (RandomCrop+RandFlip+RandAugmentation+Erasing+MixUp) pipeline, all architectures can be trained to have competitive performance, in terms of absolute in-distribution accuracy as well as relative generalization to large translation shifts.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
1 Reply
Loading