Neural Collapse meets Differential Privacy: Curious behaviors of NoisySGD with Near-Perfect Representation Learning
In recent studies, it has been demonstrated that large-scale representation learning through pre-training on gigantic datasets significantly enhances differentially private learning for downstream tasks. By training on Google's proprietary JFT dataset, one can achieve an unprecedented 83% Top 1 accuracy on ImageNet with strong privacy parameters $(0.5,8\times 10^{-7})$-DP, even given the high dimensionality of the feature space. While the exact behaviors of NoisySGD in these scenarios remain theoretically challenging to analyze, we explore an idealized setting using a layer-peeled model for representation learning, which results in interesting phenomena of the learned features known as neural collapse. Under this setting, we have observed several notable behaviors of NoisySGD. Specifically, we demonstrate that under perfect neural collapse, the misclassification error is unaffected by the dimension of the features. This dimension-independent result holds with any learning rate and even with class imbalance and is not influenced by the nature of the loss functions. Nevertheless, a dimension dependency emerges when introducing minor perturbations in either the feature or model space. To address this dependency under perturbation, we suggest several strategies, such as pre-processing features or employing principal component analysis to reduce feature dimensions.