Learning the essential in less than 2k additional weights - a simple approach to improve image classification stability under corruptions

TMLR Paper2285 Authors

23 Feb 2024 (modified: 24 Apr 2024)Under review for TMLREveryoneRevisionsBibTeX
Abstract: The performance of image classification on well-known benchmarks such as ImageNet is remarkable, but in safety-critical situations, the accuracy often drops significantly under adverse conditions. To counteract these performance drops, we propose a very simple modification to the models: we pre-pend a single, dimension preserving convolutional layer with a large linear kernel whose purpose it is to extract the information that is essential for image classification. We show that our simple modification can increase the robustness against common corruptions significantly, especially for corruptions of high severity. We demonstrate the impact of our channel-specific layers on ImageNet-100 and ImageNette classification tasks and show an increase of up to 30\% accuracy on corrupted data in the top1 accuracy. Further, we conduct a set of designed experiments to qualify the conditions for our findings. Our main result is that a data- and network dependent linear subspace carries the most important classification information (the essential), which our proposed pre-processing layer approximately identifies for most corruptions, and at very low cost.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: ### List of Changes * We extended the related work section (paragraph "Image Corruptions and Data Augmentations") to discuss prior work on data augmentation and model distillation more extensively [Reviewer 5xbk] * We added XSEResNext50 to Table 2 [Reviewer ig1t] * We specified that models are trained from scratch in paper (e.g. page 6, Beginning of the "Experimental Evaluation" section) [Reviewer ig1t,5xbk,8XnL]. * We updated Figure 4 to include more models (including e.g. Swin v2) on ImageNette [Reviewer 5xbk] * We updated Table 3 with results for Swin v2 tiny and ViT on ImageNet-100 [Reviewer 5xbk] * We moved the results on ImageNet-1k and the discussion from the appendix to the main paper (Table 4) and added results on Swinv2 (base) to the table [Reviewers 5xbk] * We added a section entitled "comparison with augmentation and joint trainable large lernel and augmentation" to the main paper (section 4.3), providing results for additional data augmentation using AugMix and the respective discussion [Reviewers 5xbk, 8XnL] * We extended the discussion on robustness methods that remove signal content in the "discussion" section [Reviewer ig1t] * According to the results on AugMix, we state that our approach is compatible with additional data augmentaiton in the conclusion. * We updated the outline of the appendix provided as bullet points in appendix A. * We added the detailed specification of all considered models in appendix C, Table 7 [Reviewer ig1t] * We added results in 3DCC (Kar et al., 2022) to appendix D [Reviewer 8XnL, 5xbk, ig1t] * We specified the meaning of the five bars per corruption type in Figures 25-30 and Figure 35-50 [Reviewer 5xbk] * We added results for models under adversarial attacks (AutoAttack and SQUARE) to the appendix (D7) [Reviewer 8XnL] * We ablate on using the pre-pended layer on models with frozen weights in appendix D8 [Reviewer 8XnL, 5xbk] All our changes are also indicated in blue color in the uploaded revision.
Assigned Action Editor: ~Vincent_Dumoulin1
Submission Number: 2285
Loading