Keywords: privacy preservation, video understanding
TL;DR: We propose a latent anonymization adapter training framework for video foundation models that preserves utility performance across multiple downstream tasks while reducing performance on a private attribute prediction task.
Abstract: The rapid advancements in large video models have unlocked new horizons in video understanding, enhancing applications in various domains such as surveillance, healthcare, and entertainment. However, these models often compromise individual privacy by inadvertently revealing sensitive private information such as skin color and gender. Existing privacy preservation methods are often limited in their scope and tailored to specific downstream tasks. Since current methods directly apply an anonymization function to the input pixel space, they demand extensive computational resources due to the retraining of the utility video model. To address these challenges, we propose a novel approach that shifts privacy-preserving anonymization from the input pixel space to the latent feature space, significantly reducing computational costs and enabling deployment in large foundational video models. Our method employs a self-supervised privacy budget in the latent space by minimizing the mutual information between static clip features. This approach notably allows, for the first time, supervision from downstream tasks such as anomaly detection and temporal action detection through collaborative co-training. Furthermore, we introduce a latent consistency loss to maintain the utility video model's multitask generalization capabilities and prevent single task overfitting. Our extensive evaluations demonstrate a significant ($\approx$\textbf{29\%}) reduction in privacy leakage while maintaining near peak (within \textbf{1\%}) utility performance across various downstream tasks: Action Recognition (Kinetics400, UCF101, HMDB51), Temporal Action Detection (THUMOS14), and Anomaly Detection (UCF-Crime). Moreover, we propose new protocols for assessing gender bias in action recognition models, demonstrating that our method effectively mitigates such biases and promotes equitable video understanding.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3232
Loading