Data-dependent bounds on network gradient descent

Avleen Singh Bijral, Anand D. Sarwate, Nathan Srebro

Published: 2016, Last Modified: 26 Nov 2023Allerton 2016Readers: Everyone

Abstract: We study a consensus-based distributed stochastic gradient method for distributed optimization in a setting common for machine learning applications. Nodes in the network hold disjoint data and seek to optimize a common objective which decomposes into a sum of convex functions of individual data points. We show that the rate of convergence for this method involves the spectral properties of two matrices: the standard spectral gap of a weight matrix from the network topology and a new term depending on the spectral norm of the sample covariance matrix of the data. This result shows the benefit of datasets with small spectral norm. Extensions of the method can identify the impact of limited communication, increasing the number of nodes, and scaling with data set size.

0 Replies