Abstract: Subspace-based DPSGD has emerged as a robust solution for alleviating excessive noise in high-dimensional, privacy-preserving visual learning. It achieves a favorable privacy-utility balance by projecting privatized gradients onto a low-rank subspace derived from in-distribution public data. However, relying solely on limited public data can narrow the diversity of the anchored subspace and induce over-memorization into model training, leading to suboptimal private optimization.To overcome these limitations, we propose a synthesis-augmented subspace-based DPSGD framework, SAS-DPSGD, which integrates synthetic data into the subspace construction process. We quantitatively analyze the private optimization performance using the subspace derived from the mixed public and synthetic data, revealing the benefits as well as the saturation effects of incorporating synthetic data in private visual learning. To the best of our knowledge, this is the first work to provide a unified theoretical guarantee to synthesis-augmented subspace-based DPSGD. Moreover, we design an early projection mechanism within our framework that projects the gradient onto the subspace before performing gradient clipping. This mechanism effectively reduces the gradient clipping bias and lowers the synthetic data requirement, resulting in a faster convergence rate. Extensive experiments on two real-world datasets validate that SAS-DPSGD outperforms nine baselines by up to 9.78% in accuracy and can reduce the amount of synthetic data required by 66.7%.
External IDs:doi:10.1145/3746027.3755037
Loading