Abstract: Recently, large pre-trained foundation models have become
widely adopted by machine learning practitioners for a multitude of
tasks. Given that such models are publicly available, relying on their
use as backbone models for downstream tasks might result in high vulnerability to adversarial attacks crafted with the same public model.
In this work, we propose Robustness Tokens, a novel approach specific
to the transformer architecture that fine-tunes a few additional private
tokens with low computational requirements instead of tuning model
parameters as done in traditional adversarial training. We show that
Robustness Tokens make Vision Transformer models significantly more
robust to white-box adversarial attacks while also retaining the original
downstream performances.
Loading