Generalization Analysis of SGD in Linear Regression under Covariate Shift: A View from Preconditioning
Keywords: Covariate Shift, Linear Regression, Minimax Optimality, SGD with Momentum
Abstract: Recent years have witnessed the widespread success of stochastic gradient descent (SGD)-type algorithms across various problem domains, including those involving covariate shift tasks. However, the underlying mechanisms that enable SGD to generalize effectively in covariate shift settings, as well as the specific types of covariate shift problems where SGD demonstrates provable efficiency, remain insufficiently understood. This paper investigates SGD in the context of linear regression under a canonical covariate shift problem. Our analysis is two-fold: First, we derive an upper bound for the target excess risk of SGD, incorporating two critical practical techniques—momentum acceleration and step decay scheduling. Second, we analyze SGD's performance by framing it as a preconditioned estimator, enabling us to identify conditions under which SGD achieves statistical optimality. We demonstrate that SGD attains optimal performance in several commonly studied settings. Additionally, we demonstrate that there exist separations between several commonly used methods.
Supplementary Material: pdf
Primary Area: optimization
Submission Number: 22940
Loading