Abstract: Many statistical estimators for high-dimensional linear regression are M-estimators, formed
through minimizing a data-dependent square loss function plus a regularizer. This work considers
a new class of estimators implicitly defined through a discretized gradient dynamic system under
overparameterization. We show that, under suitable restricted isometry conditions, overparameterization leads to implicit regularization: if we directly apply gradient descent to the residual
sum of squares with sufficiently small initial values then, under some proper early stopping rule,
the iterates converge to a nearly sparse rate-optimal solution that improves over explicitly regularized approaches. In particular, the resulting estimator does not suffer from extra bias due to
explicit penalties, and can achieve the parametric root-n rate when the signal-to-noise ratio is
sufficiently high. We also perform simulations to compare our methods with high-dimensional
linear regression with explicit regularization. Our results illustrate the advantages of using implicit
regularization via gradient descent after overparameterization in sparse vector estimation.
0 Replies
Loading