Keywords: privacy protection, action recognition, label-free, unsupervised, zero-shot
TL;DR: We introduce LaF-Privacy, a label-free privacy-preserving framework for training anonymizers that preserve action semantics and support zero-shot recognition with VLMs.
Abstract: Traditional action recognition relies on labeled data and closed-set assumptions, limiting adaptability to novel actions and environments. Vision-Language Models (VLMs) offer a more flexible alternative through text-image alignment, enabling zero-shot action recognition. However, using raw video data poses privacy risks due to sensitive visual content. Privacy-Preserving Action Recognition (PPAR) aims to anonymize videos while preserving action-relevant semantics. Existing learning-based PPAR approaches often require both action and privacy annotations and retraining of recognition models on anonymized data, limiting their flexibility and compatibility with powerful pretrained VLMs. We propose LaF-Privacy, a novel label-free privacy-preserving framework for zero-shot action recognition. Our method is trained without any manual annotations, using two complementary objectives: preserving high-level action-relevant features and suppressing low-level appearance cues between raw and anonymized videos. We adopt a video transformer encoder for spatio-temporal learning and introduce an Action-Aware Masking Module (AAMM) to discard irrelevant regions, further enhancing privacy. LaF-Privacy enables direct use of pretrained VLMs for zero-shot inference on anonymized videos. Experiments on VP-UCF101 and VP-HMDB51 demonstrate that our approach achieves state-of-the-art trade-offs between privacy protection and zero-shot recognition performance.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 16777
Loading