Scaling Randomized Smoothing to state-of-the-art Vision-Langauge Models

Published: 23 Jun 2025, Last Modified: 23 Jun 2025Greeks in AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Randomized Smoothing, Vision Language Models, AI Safety
TL;DR: We extend and scale Randomized Smoothing to Vision Language models
Abstract: Certifying the robustness of Deep Neural Networks (DNNs) is crucial, especially with the rise of powerful generative models, such as Large Language Models (LLMs) or Vision-Language Models (VLMs), that have the potential of generating dangerous or harmful responses. Recent work has shown that these large-scale models are still susceptible to adversarial attacks, despite their safety fine-tuning. Randomized Smoothing (RS), the current state-of-the-art (SoTA) method for robustness certification, cannot be applied on models such as VLMs: first, RS is designed for classification, not generation. Second, RS is a probabilistic approach, typically requiring 10^5 samples to certify a single input, making it infeasible for large-scale modern VLMs. This is the challenge we aim to solve in this paper. First, we reformulate RS for the case of generative models, where we distinguish between harmless and harmful responses. Moreover, we develop a theory that allows us to reduce the number of samples required by 2-3 orders of magnitude, without much effect on the certified radius, and mathematically analyze its dependence to the number of samples. Combined, these advances allow us to scale RS on state-of-the-art VLMs, something that was not feasible before. We successfully showcase this experimentally by defending against a recent SoTA attack against aligned VLMs.
Submission Number: 27
Loading