Neural Collapse meets Differential Privacy: Curious behaviors of NoisySGD with Near-Perfect Representation Learning

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Differential privacy; neural collapse; DP-SGD; representation Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We investigate DP-SGD's performance in fine-tuning under the regime of neural collapse and assess its resilience to perturbations in pre-trained features.
Abstract: In recent studies, it has been demonstrated that large-scale representation learning through pre-training on gigantic datasets significantly enhances differentially private learning for downstream tasks. By training on Google's proprietary JFT dataset, one can achieve an unprecedented 83% Top 1 accuracy on ImageNet with strong privacy parameters $(0.5,8\times 10^{-7})$-DP, even given the high dimensionality of the feature space. While the exact behaviors of NoisySGD in these scenarios remain theoretically challenging to analyze, we explore an idealized setting using a layer-peeled model for representation learning, which results in interesting phenomena of the learned features known as neural collapse. Under this setting, we have observed several notable behaviors of NoisySGD. Specifically, we demonstrate that under perfect neural collapse, the misclassification error is unaffected by the dimension of the features. This dimension-independent result holds with any learning rate and even with class imbalance and is not influenced by the nature of the loss functions. Nevertheless, a dimension dependency emerges when introducing minor perturbations in either the feature or model space. To address this dependency under perturbation, we suggest several strategies, such as pre-processing features or employing principal component analysis to reduce feature dimensions.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8521
Loading