Privacy at Interpolation: Precise Analysis for Random and NTK Features

Simone Bombari; Marco Mondelli

Privacy at Interpolation: Precise Analysis for Random and NTK Features

Simone Bombari, Marco Mondelli

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Random Features, Neural Tangent Kernel, Privacy, Stability, Generalization, Empirical Risk Minimization, Interpolation

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We prove that, under training via empirical risk minimization at interpolation, privacy increases with the generalization performance of the model, in the context of random features and NTK regression.

Abstract: Deep learning models often memorize the training set. This makes them vulnerable to recovery attacks, raising privacy concerns to users, and many widespread algorithms such as empirical risk minimization (ERM) do not directly enforce safety guarantees. In this paper, we study the safety of ERM models when the training samples are interpolated (i.e., *at interpolation*) against a family of powerful black-box information retrieval attacks. Our analysis quantifies this safety via two separate terms: *(i)* the model *stability* with respect to individual training samples, and *(ii)* the *feature alignment* between attacker query and original data. While the first term is well established in learning theory and it is connected to the generalization error in classical work, the second one is, to the best of our knowledge, novel. Our key technical result characterizes precisely the feature alignment for the two prototypical settings of random features (RF) and neural tangent kernel (NTK) regression. This proves that privacy strengthens with an increase in generalization capability, unveiling the role of the model and of its activation function. Numerical experiments show an agreement with our theory not only for RF/NTK models, but also for deep neural networks trained on standard datasets (MNIST, CIFAR-10).

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7071

Loading