Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness

Bo Lei; Tavish Malcolm McDonald; Stanislav Fort; Bhavya Kailkhura; Brian R. Bartoldson

Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness

Bo Lei, Tavish Malcolm McDonald, Stanislav Fort, Bhavya Kailkhura, Brian R. Bartoldson

12 May 2025 (modified: 29 Oct 2025)Submitted to NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: VLMs, robustness, adversarial attacks, reasoning, scaling, efficiency

TL;DR: Scaling inference-time compute to improve robustness is more beneficial as you improve base model robustness.

Abstract: Recent work shows that increasing inference-time compute through generation of long reasoning traces improves not just capability scores, but robustness to various text jailbreaks designed to control models or lower their guardrails. However, multimodal reasoning offers comparatively little defense against vision jailbreaks, which typically succeed by creating noise-like perturbations. When attacking a robust model, vision attacks are also capable of and often must resort to producing human-interpretable perturbations. Rather than operating in a model's blind-spot or out of its training distribution, such interpretable attacks construct familiar concepts connected to the attacker's goal. Inspired by the ability of robust models to force attacks into this space that appears more in-distribution for reasoning tasks, we posit the Robustness from Inference Compute Hypothesis (RICH): defending against attacks with inference compute (like reasoning) profits as those attacks become more in-distribution. To test this, we adversarially attack models of varying robustness with black-box-transfer and white-box attacks. RICH predicts a rich-get-richer dynamic: models that start with higher initial robustness gain more robustness benefits from increases in inference-time compute. Consistent with RICH, we find that robust models benefit more from increased compute, whereas non-robust models show little to no improvement. Our work suggests that inference-time compute can be an effective defense against adversarial attacks, provided the base model has some degree of robustness. In particular, layering disparate train-time and test-time defenses aids robustness not additively, but synergistically.

Supplementary Material: zip

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Flagged For Ethics Review: true

Submission Number: 24976

Loading