Spatio-temporal Feature-level Augmentation Vision Transformer for video-based person re-identification

Minjung Kim, MyeongAh Cho, Heansung Lee, Sangyoun Lee

Published: 2025, Last Modified: 17 Jul 2025Pattern Recognit. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•Novel background token differentiates foreground and background effectively.•Spatial feature augmentation alters backgrounds and predicts person IDs for these samples.•Temporal feature augmentation creates irregular samples and detects anomaly frames.•Our method shows competitive results with fewer parameters and strong generalization.