LipShiFT: A Certifiably Robust Shift-based Vision Transformer

Rohan Menon; Nicola Franco; Stephan Günnemann

LipShiFT: A Certifiably Robust Shift-based Vision Transformer

Rohan Menon, Nicola Franco, Stephan Günnemann

Published: 06 Mar 2025, Last Modified: 18 Mar 2025ICLR 2025 Workshop VerifAI PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Machine Learning, Adversarial Robustness, Computer Vision

TL;DR: We improve certified robustness for vision transformers using a Lipschitz-continuous ShiftViT variant, achieving state-of-the-art results that scale to larger models.

Abstract: Deriving tight Lipschitz bounds for transformer-based architectures presents a significant challenge. The large input sizes and high-dimensional attention modules typically prove to be crucial bottlenecks during the training process and leads to sub-optimal results. Our research highlights practical constraints of these methods in vision tasks. We find that Lipschitz-based margin training acts as a strong regularizer while restricting weights in successive layers of the model. Focusing on a Lipschitz continuous variant of the ShiftViT model, we address significant training challenges for transformer-based architectures under norm-constrained input setting. We provide an upper bound estimate for the Lipschitz constants of this model using the $l_2$ norm on common image classification datasets. Ultimately, we demonstrate that our method scales to larger models and advances the state-of-the-art in certified robustness for transformer-based architectures.

Submission Number: 25

Loading