Context is the Key: Backdoor Attacks for In-Context Learning with Vision Transformers

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: backdoor attacks, in-context learning, vision transformers
TL;DR: ViTs can be backdoored to behave maliciously during in-context learning for image tasks (even unseen tasks). Attackers can poison pretrained models to cause up to 13× performance drops or unwanted image transformations when triggered.
Abstract: Due to the high cost of training, practitioners often rely on pretrained large models (LMs) from untrusted sources, exposing them to backdoor risks. In-context learning enables LMs to perform tasks based on prompts, introducing new attack surfaces for dynamic and flexible backdoor attacks. We study backdoor attacks against Vision Transformers (ViTs) under in-context learning for image-to-image tasks. We demonstrate that ViTs trained with masked image modeling can be poisoned to exhibit highly flexible malicious behaviors. Our analysis combines different trigger injection methods (BadNets, WaNet, and Blended), malicious objectives (Denial of Service, identity mapping, and black-and-white conversion), attacker goals (source-specific vs. source-agnostic), and stealthiness variations, i.e., parameter space stealthiness. We achieve significant attack effectiveness: up to $13\times$ performance degradation in DoS tasks and high similarity scores on identity-mapping and conversion tasks. Using a parameter space attacks further improves the attack performance while grating stealthiness in both input and parameter spaces. In-context learning grants attackers diverse possibilities for injecting backdoors and launching malicious tasks, even with data distributions absent from training. We evaluate standard mitigation strategies, including prompt engineering, fine-tuning, and fine-pruning. These defenses are largely ineffective, e.g., fine-tuning only reduces performance degradation from 89.90\% to 73.46\%, or fine-pruning reduces the attack performance by 4\% in cost of 28.5\% clean performance degradation. \footnote{Our code is available at~\url{https://anonymous.4open.science/r/Inpainting-Backdoor}.}
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 9084
Loading