Replicable Machine Learning: Theory and Algorithms for Stochastic Convex and Non-Convex Optimization

Kaibo Zhang, Raman Arora

Published: 22 Jan 2026, Last Modified: 22 Jan 2026The 29th International Conference on Artificial Intelligence and StatisticsEveryoneRevisionsCC BY 4.0

Abstract: Replicable algorithms produce identical outputs with high probability when run on independent samples from the same distribution. We study replicable stochastic optimization providing algorithms with near-optimal sample complexity across convex and non-convex settings. For general Lipschitz losses, the exponential mechanism with correlated sampling achieves optimal excess risk and replicability, but with exponential runtime. For strongly convex losses, empirical risk minimization (ERM) with randomized rounding achieves slightly worse excess risk in polynomial time. For general convex losses, regularized ERM yields $n^{-1/4}$ rates. We extend our techniques to neural networks in the NTK regime. Our work reveals a fundamental computational statistical tradeoff. Optimal replicability requires exponential time, while efficient algorithms incur modest statistical penalties.