Keywords: Computer-use agents
Abstract: Computer-use agents powered by vision-language models (VLMs) have significantly advanced human-computer interaction. However, they remain vulnerable to \textit{context deception attacks}, an emerging threat where adversaries embed misleading content into the agent's operational environment (such as a malicious pop-up window) to hijack agent behavior. As recent benchmarks highlight the severity of these attacks, initial studies have shown that conventional defenses like direct prompting are largely ineffective, fostering a perception that these attacks are a difficult and unsolved challenge. In this paper, we challenge this conclusion, arguing that the perceived difficulty is an artifact of the defense paradigms studied, not an inherent property of the attacks themselves. We introduce in-context defense, a surprisingly simple paradigm that leverages in-context learning to prove our claim. By augmenting the agent’s context with a minimal set of exemplars, we guide it to perform explicit defensive reasoning before action planning, effectively immunizing it from deception. Experiments show this method is remarkably effective, reducing up to 91.2\% of pop-up window attacks and achieving near-perfect defense on some other attacks, a stark contrast to the failures of prior approaches. Our work delivers two critical insights: (1) the problem of context deception is far more tractable than previously believed, and (2) teaching an agent a reasoning process (defense-first analysis), rather than just giving it a rule, is the key to robust defense.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 4688
Loading