Scalable Classifiers with ADMM and Transpose Reduction

Gavin Taylor, Zheng Xu, Tom Goldstein

2017 (modified: 02 Mar 2020)AAAI Workshops 2017Readers: Everyone

Abstract: As datasets for machine learning grow larger, parallelization strategies become more and more important. Recent approaches to distributed modelfitting rely heavily either on consensus ADMM, where each node solves smallsub-problems using only local data, or on stochastic gradient methods thatdon't scale well to large numbers of cores in a cluster setting. For this reason, GPU clusters have become common prerequisites to large-scale machinelearning. This paper describes an unconventional training method that uses alternating direction methods and Bregman iteration to train a variety of machine learning models on CPUs while avoiding the drawbacks of consensus methods and without gradient descent steps. Using transpose reduction strategies, the proposed method reduces the optimization problems to a sequence of minimization sub-steps that can each be solved globally in closed form. The method provides strong scaling in the distributed setting, yielding linear speedups even when split over thousands of cores.

0 Replies