Frank-Wolfe-based Algorithms for Approximating Tyler's M-estimator

Lior Danon; Dan Garber

Frank-Wolfe-based Algorithms for Approximating Tyler's M-estimator

Lior Danon, Dan Garber

Published: 31 Oct 2022, Last Modified: 11 Jan 2023NeurIPS 2022 AcceptReaders: Everyone

Keywords: Frank-Wolfe, conditional gradient, projection-free, Tyler's M-estimator, robust covariance estimation, heavy tailed distributions, elliptical distributions, convex optimization, nonconvex optimization

TL;DR: First Frank-Wolfe algorithms for computing Tyler's M-estimator which converge to the optimal solution with (nearly) linear runtime per iteration and with linear rates

Abstract: Tyler's M-estimator is a well known procedure for robust and heavy-tailed covariance estimation. Tyler himself suggested an iterative fixed-point algorithm for computing his estimator however, it requires super-linear (in the size of the data) runtime per iteration, which maybe prohibitive in large scale. In this work we propose, to the best of our knowledge, the first Frank-Wolfe-based algorithms for computing Tyler's estimator. One variant uses standard Frank-Wolfe steps, the second also considers \textit{away-steps} (AFW), and the third is a \textit{geodesic} version of AFW (GAFW). AFW provably requires, up to a log factor, only linear time per iteration, while GAFW runs in linear time (up to a log factor) in a large $n$ (number of data-points) regime. All three variants are shown to provably converge to the optimal solution with sublinear rate, under standard assumptions, despite the fact that the underlying optimization problem is not convex nor smooth. Under an additional fairly mild assumption, that holds with probability 1 when the (normalized) data-points are i.i.d. samples from a continuous distribution supported on the entire unit sphere, AFW and GAFW are proved to converge with linear rates. Importantly, all three variants are parameter-free and use adaptive step-sizes.

Supplementary Material: pdf

20 Replies

Loading