Keywords: Distributionally-Robust Optimization, Nonconvex Optimization, Primal-Dual Algorithm
Abstract: We study the problem of learning a single neuron with respect to the $L_2^2$-loss in the presence of adversarial distribution shifts, where the labels can be arbitrary, and the goal is to find a "best-fit" function.
More precisely, given training samples from a reference distribution $p_0$,
the goal is to approximate the vector $\mathbf{w}^*$
which minimizes the squared loss with respect to the worst-case distribution
that is close in $\chi^2$-divergence to $p_{0}$.
We design a computationally efficient algorithm that recovers a vector $ \hat{\mathbf{w}}$
satisfying
$\mathbb{E}\_{p^*} (\sigma(\hat{\mathbf{w}} \cdot \mathbf{x}) - y)^2 \leq C \hspace{0.2em} \mathbb{E}\_{p^*} (\sigma(\mathbf{w}^* \cdot \mathbf{x}) - y)^2 + \epsilon$, where $C>1$ is a dimension-independent constant and $(\mathbf{w}^*, p^*)$ is the witness attaining the min-max risk
$\min_{\mathbf{w}:\|\mathbf{w}\| \leq W} \max\_{p} \mathbb{E}\_{(\mathbf{x}, y) \sim p} (\sigma(\mathbf{w} \cdot \mathbf{x}) - y)^2 - \nu \chi^2(p, p_0)$.
Our algorithm follows the primal-dual framework and is
designed by directly bounding the risk with respect to the original, nonconvex $L_2^2$ loss.
From an optimization standpoint, our work opens new avenues for the design of primal-dual algorithms under structured nonconvexity.
Primary Area: Learning theory
Flagged For Ethics Review: true
Submission Number: 11289
Loading