Neighborhood Gradient Mean: An Efficient Decentralized Learning Method for Non-IID Data

Sai Aparna Aketi; Sangamesh Kodge; Kaushik Roy

Neighborhood Gradient Mean: An Efficient Decentralized Learning Method for Non-IID Data

Sai Aparna Aketi, Sangamesh Kodge, Kaushik Roy

Published: 31 Oct 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Decentralized learning algorithms enable the training of deep learning models over large distributed datasets, without the need for a central server. The current state-of-the-art decentralized algorithms mostly assume the data distributions to be Independent and Identically Distributed (IID). In practical scenarios, the distributed datasets can have significantly different data distributions across the agents. This paper focuses on improving decentralized learning on non-IID data with minimal compute and memory overheads. We propose Neighborhood Gradient Mean (NGM), a novel decentralized learning algorithm that modifies the local gradients of each agent using self- and cross-gradient information. In particular, the proposed method averages the local gradients with model-variant or data-variant cross-gradients based on the communication budget. Model-variant cross-gradients are derivatives of the received neighbors’ model parameters with respect to the local dataset. Data-variant cross-gradient derivatives of the local model with respect to its neighbors’ datasets. The data-variant cross-gradients are aggregated through an additional communication round. We theoretically analyze the convergence characteristics of NGM and demonstrate its efficiency on non-IID data sampled from various vision and language datasets. Our experiments demonstrate that the proposed method either remains competitive or outperforms (by 0-6%) the existing state-of-the-art (SoTA) decentralized learning algorithm on non-IID data with significantly less compute and memory requirements. Further, we show that the model-variant cross-gradient information available locally at each agent can improve the performance on non-IID data by 3-20% without additional communication costs.

Submission Length: Regular submission (no more than 12 pages of main content)

Code: https://github.com/aparna-aketi/neighborhood_gradient_clustering

Assigned Action Editor: ~Sebastian_U_Stich1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 1513

Loading