The Role of Randomness in Stability

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We study the number of random bits needed to achieve replicability and differential privacy for general statistical tasks and PAC Learning.
Abstract: Stability is a central property in learning and statistics promising the output of an algorithm $\mathcal{A}$ does not change substantially when applied to similar datasets $S$ and $S'$. It is an elementary fact that any sufficiently stable algorithm (e.g.\ one returning the same result with high probability, satisfying privacy guarantees, etc.) must be randomized. This raises a natural question: can we quantify \textit{how much} randomness is needed for algorithmic stability? We study the randomness complexity of two influential notions of stability in learning: \textit{replicability} (which promises $\mathcal{A}$ usually outputs the same result when run over samples from the same distribution), and \textit{differential privacy} (which promises the output distribution of $\mathcal{A}$ remains similar under neighboring datasets). In particular, building on the ideas of (Dixon, Pavan, Vander Woude, and Vinodchandran ICML 2024) and (Cannone, Su, and Vadhan ITCS 2024), we prove a "weak-to-strong" boosting theorem for stability in these settings: the randomness complexity of a task $\mathcal{M}$ is tightly controlled by the best replication probability of any \textit{deterministic} algorithm solving $\mathcal{M}$, a parameter known as $\mathcal{M}$'s "global stability" (Chase, Moran, Yehudayoff FOCS 2023). Finally, we use this connection to characterize the randomness complexity of PAC Learning: a class has bounded randomness complexity iff it has finite Littlestone dimension, and moreover scales at worst logarithmically in the excess error of the learner. As a corollary, we resolve a question of (Chase, Chornomaz, Moran, and Yehudayoff STOC 2024) about the error-dependent list-replicability of agnostic learning.
Lay Summary: *Stability* is a central tenet in algorithm design stating that feeding "similar inputs" into an algorithm A should beget "similar outputs". It is well known that any strongly stable (e.g. private) algorithm must be randomized, but despite years of work on the cost of stability in machine learning, we have almost no understanding *how much randomness* is needed even in basic settings like binary classification. We characterize the amount of randomness needed for a task by the best weak stability achieved by any deterministic algorithm solving the problem. Using connections between weak stability and combinatorial structure in learning, we use this to give the first randomness-efficient stable algorithm for basic learning tasks, along with corresponding lower bounds. In the era of big data, stability is a critical property both to protect user data and ensure reliability of our algorithms. Our work sheds new light on an understudied resource needed to achieve stability in theory and practice.
Primary Area: Theory->Learning Theory
Keywords: Replicability, Differential Privacy, PAC Learning, Randomness, Stability
Submission Number: 14750
Loading