Abstract: Highlights•Introduced Temporal-iD, a privacy-friendly identity signature extracted from video by capturing only face outline variations over time.•To achieve Temporal-iD, a lightweight recurrent-based Vision Transformer, namely TiDViT, is designed to process video frames efficiently.•A Multi-head Temporal–Spatial joint self-Attention (MTSA) module is designed to enhance TiDViT by interacting with aggregated temporal features and current spatial features.•Extensive experiments on four public face video datasets show that Temporal-iD features are a viable identity signature.
Loading