Spatio-temporal Feature-level Augmentation Vision Transformer for video-based person re-identification
Abstract: Highlights•Novel background token differentiates foreground and background effectively.•Spatial feature augmentation alters backgrounds and predicts person IDs for these samples.•Temporal feature augmentation creates irregular samples and detects anomaly frames.•Our method shows competitive results with fewer parameters and strong generalization.
Loading