Differentially Private Vision-Language Foundation Models via Image Captioning

Tom Sander; Yaodong Yu; Maziar Sanjabi; Alain Oliviero Durmus; Yi Ma; Kamalika Chaudhuri; Chuan Guo

Differentially Private Vision-Language Foundation Models via Image Captioning

Tom Sander, Yaodong Yu, Maziar Sanjabi, Alain Oliviero Durmus, Yi Ma, Kamalika Chaudhuri, Chuan Guo

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: societal considerations including fairness, safety, privacy

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: differential privacy, private foundation model, vision-language model

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We train the first vision-language foundation model with differential privacy guarantees.

Abstract: The common practice of training foundation models on web-crawled data raises privacy and copyright concerns, as sensitive training data can be memorized by the model and unintentionally misused. Differential privacy (DP) is a robust and rigorous framework for mitigating against such risks, albeit often with significant performance loss and is commonly perceived as unviable in most use case. In this work, we demonstrate that combining DP with vision-language pre-training can be a powerful recipe for obtaining differentially private foundation models trained from scratch. Our model uses text supervision to learn superior image representations, and also exhibits the first instance of multi-modal capabilities for DP training. Under a privacy budget of $\varepsilon=8$, our image captioner (DP-Cap) trained on a 233M subset of the LAION-2B dataset attains 52.8\% zero-shot accuracy on CIFAR-10. On the challenging ARO benchmark, DP-Cap achieves performance close to its non-private counterpart (Cap), and greatly surpasses the best non-private CLIP model. Our work challenges the prevailing sentiment that high-utility foundation models are unattainable for DP training from scratch.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2977

Loading