Clipped SGD Algorithms for Performative Prediction: Tight Bounds for Stochastic Bias and Remedies

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: This paper studies the convergence properties of two clipped stochastic algorithms: Clipped SGD and DiceSGD, in the performative prediction setting, where the data distribution may shift due to the deployed prediction model.
Abstract: This paper studies the convergence of clipped stochastic gradient descent (SGD) algorithms with decision-dependent data distribution. Our setting is motivated by privacy preserving optimization algorithms that interact with performative data where the prediction models can influence future outcomes. This challenging setting involves the non-smooth clipping operator and non-gradient dynamics due to distribution shifts. We make two contributions in pursuit for a performative stable solution with these algorithms. First, we characterize the stochastic bias with projected clipped SGD (PCSGD) algorithm which is caused by the clipping operator that prevents PCSGD from reaching a stable solution. When the loss function is strongly convex, we quantify the lower and upper bounds for this stochastic bias and demonstrate a bias amplification phenomenon with the sensitivity of data distribution. When the loss function is non-convex, we bound the magnitude of stationarity bias. Second, we propose remedies to mitigate the bias either by utilizing an optimal step size design for PCSGD, or to apply the recent DiceSGD algorithm [Zhang et al., 2024]. Our analysis is also extended to show that the latter algorithm is free from stochastic bias in the performative setting. Numerical experiments verify our findings.
Lay Summary: Modern machine learning models often learn from data that is influenced by their own predictions — like a recommendation system that shapes what users click on next. In such cases, training becomes more complex, especially when using techniques that protect user privacy by clipping overly large gradients — that is, limiting the magnitude of each update to avoid revealing sensitive information. Our study reveals that while clipping helps with privacy, it also introduces a problem: clipping bias. This bias can prevent the model from fully learning, especially when the data changes in response to the model’s outputs. We analyze how this bias behaves under different learning conditions — showing that it can grow significantly depending on how sensitive the data is. To address this, we propose two effective fixes: (1) carefully tuning the model’s learning rate, and (2) using a newer training method called DiceSGD, which, in our setting, provably eliminates clipping bias. Our work highlights the trade-off between privacy and learning accuracy, and offers practical guidance to balance both in real-world systems.
Primary Area: Optimization->Stochastic
Keywords: Clipped Stochastic Algorithm, Performative Prediction, (Non)convex Optimization
Submission Number: 6207
Loading