Keywords: Adversarial Attacks, Adversarial Robustness, Referring Multi-Object Tracking, Transformer
Abstract: Language-vision understanding has driven the development of Referring Multi-Object Tracking (RMOT). However, their security remains underexplored. We examine adversarial vulnerabilities in Transformer-based RMOT, showing that crafted perturbations disrupt both linguistic-visual referring and object-matching components. We introduce VEIL, an adversarial framework that exposes persistent errors in FIFO-based temporal memory and compromises tracking reliability.
Submission Number: 233
Loading