Improving Robustness in Vision Transformers with Nullspace Noise Augmented Finetuning

Haoyang Liu; Aditya Singh; Yijiang Li; Haohan Wang

Improving Robustness in Vision Transformers with Nullspace Noise Augmented Finetuning

Haoyang Liu, Aditya Singh, Yijiang Li, Haohan Wang

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: representation learning for computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Robustness; Computer Vision; Transformer

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We propose a fune-tuning approach to improve the robustness of vision transformer models across various benchmarks, based on the connections we find between the robustness and nullspace of the model.

Abstract: Enhancing the robustness of deep learning models, particularly in the realm of vision transformers (ViTs), is crucial for their real-world deployment. In this work, we explore the robustness of vision transformer models through the lens of nullspace, a fundamental concept in linear algebra, to propose a fine-tuning method that improves model robustness under various input perturbations. Our investigation centers on whether a vision transformer can exhibit resilience to input variations akin to the nullspace property in linear mappings, implying that perturbations sampled from this nullspace do not influence the model's output when added to the input. We confirm this by demonstrating the existence of a non-trivial nullspace in vision transformers, primarily attributed to the patch embedding layer. Moreover, we extend this idea beyond the linear layers, showcasing the feasibility of learning a non-linear counterpart (approximate nullspace) to the traditional nullspace for vision transformers through optimization techniques. Based on these insights, we propose a fine-tuning approach employing approximate nullspace noise to bolster the robustness of ViT models. Remarkably, within just a single epoch of fine-tuning, our method effectively mitigates the adverse effects of distribution shifts and adversarial perturbations across a wide spectrum of scenarios.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7747

Loading