Adversarial Object Hallucination Attacks in Video-Language Models via Intermediate Feature Alignment
Keywords: Video Large Language Models, Adversarial Attack, Object Hallucination
Abstract: Video Large Language Models (Vid-LLMs) have rapidly advanced video understanding, yet their robustness against semantic adversarial manipulation, especially object hallucination, remains largely unexplored. We introduce Adversarial Object Hallucination (AOH), a novel attack that compels Vid-LLMs to ``see" non-existent objects in videos by injecting visually imperceptible perturbations. Unlike prior attacks limited to inputs or outputs of videos, AOH directly manipulates intermediate connector features, aligning them with representations from a target video to induce controllable hallucinations. To systematically assess this threat, we curate a benchmark of 535 clean/target video pairs with high-quality VQA annotations. Extensive experiments show that AOH poses a severe threat to state-of-the-art Vid-LLMs, achieving highly effective attacks with alarming \emph{cross-scale transferability}: adversarial examples optimized on smaller models transfer even more strongly to larger counterparts of the same architecture, amplifying attack impact while reducing adversarial cost. Further analyses reveal that perturbations encode semantic object contours, while Grad-CAM highlights their covert influence. These findings expose a severe and previously overlooked vulnerability in Vid-LLMs, raising urgent concerns about their secure deployment and providing a foundation for future adversarial research in video-language modeling.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 19729
Loading