Leveraging Randomness in Model and Data Partitioning for Privacy Amplification

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: When each data point only interacts with a randomly chosen subset of model parameters or training iterations, it yields nontrivial privacy amplification—and our work is the first to formalize and quantify this effect under RDP.
Abstract: We study how inherent randomness in the training process—where each sample (or client in federated learning) contributes only to a randomly selected portion of training—can be leveraged for privacy amplification. This includes (1) data partitioning, where a sample participates in only a subset of training iterations, and (2) model partitioning, where a sample updates only a subset of the model parameters. We apply our framework to model parallelism in federated learning, where each client updates a randomly selected subnetwork to reduce memory and computational overhead, and show that existing methods, e.g. model splitting or dropout, provide a significant privacy amplification gain not captured by previous privacy analysis techniques. Additionally, we introduce balanced iteration subsampling, a new data partitioning method where each sample (or client) participates in a fixed number of training iterations. We show that in certain regimes, this method yields stronger privacy amplification than Poisson (i.i.d.) sampling of data (or clients). Our results demonstrate that randomness in the training process, which is structured rather than i.i.d. and interacts with data in complex ways, can be systematically leveraged for nontrivial privacy amplification.
Lay Summary: Machine learning models typically protect privacy by adding random noise during training, but too much noise can harm accuracy. We show that hiding which parts of the model or which training steps each data point sees makes it harder to trace any one example. This hidden randomness gives a boost to privacy (so we can add less extra noise to achieve the same level of privacy) without changing the basic training algorithm. We are the first to explain and quantify exactly how much extra privacy this gives. Our approach is especially useful in federated learning, where devices have limited compute power: by training only on submodels, weaker devices can still participate, and we show how it also improves privacy guarantees.
Primary Area: Social Aspects->Privacy
Keywords: differential privacy, federated learning, privacy amplification, model parallelism, data subsampling
Submission Number: 5228
Loading