Abstract: The study of security in machine learning mainly focuses on downstream task-specific attacks, where the adversarial example is obtained by optimizing a loss function
specific to the downstream task. At the same time, it has
become standard practice for machine learning practitioners to adopt publicly available pre-trained vision foundation models, effectively sharing a common backbone architecture across a multitude of applications such as classification, segmentation, depth estimation, retrieval, questionanswering and more. The study of attacks on such foundation models and their impact to multiple downstream tasks
remains vastly unexplored. This work proposes a general
framework that forges task-agnostic adversarial examples
by maximally disrupting the feature representation obtained
with foundation models. We extensively evaluate the security of the feature representations obtained by popular vision foundation models by measuring the impact of this attack on multiple downstream tasks and its transferability
between models.
Loading