Privacy at Interpolation: Precise Analysis for Random and NTK Features

Published: 07 Nov 2023, Last Modified: 13 Dec 2023M3L 2023 PosterEveryoneRevisionsBibTeX
Keywords: Random Features, Neural Tangent Kernel, Privacy, Stability, Generalization, Empirical Risk Minimization, Interpolation
TL;DR: We prove that, under training via empirical risk minimization at interpolation, privacy increases with the generalization performance of the model, in the context of random features and NTK regression.
Abstract: Deep learning models are able to memorize the training set. This makes them vulnerable to recovery attacks, raising privacy concerns to users, and many widespread algorithms such as empirical risk minimization (ERM) do not directly enforce safety guarantees. In this paper, we study the safety of ERM models when the training samples are interpolated (i.e., *at interpolation*) against a family of powerful black-box information retrieval attacks. Our analysis quantifies this safety via two separate terms: *(i)* the model *stability* with respect to individual training samples, and *(ii)* the *feature alignment* between attacker query and original data. While the first term is well established in learning theory and it is connected to the generalization error in classical work, the second one is, to the best of our knowledge, novel. Our key technical result characterizes precisely the feature alignment for the two prototypical settings of random features (RF) and neural tangent kernel (NTK) regression. This proves that privacy strengthens with an increase in generalization capability, unveiling the role of the model and of its activation function. Numerical experiments show an agreement with our theory not only for RF/NTK models, but also for deep neural networks trained on standard datasets (MNIST, CIFAR-10).
Submission Number: 33