Keywords: decentralized stochastic gradient descent: bilevel optimization: hyper-gradient: personalization: directed network: federated learning: distributed learning: fully-decentralized
Abstract: While personalization in distributed learning has been extensively studied, existing approaches employ dedicated algorithms to optimize their specific type of parameters (e.g., client clusters or model interpolation weights), making it difficult to simultaneously optimize different types of parameters to yield better performance.
Moreover, their algorithms require centralized or static undirected communication networks, which can be vulnerable to center-point failures or deadlocks.
This study proposes optimizing various types of parameters using a single algorithm that runs on more practical communication environments.
First, we propose a gradient-based bilevel optimization that reduces most personalization approaches to the optimization of client-wise hyperparameters.
Second, we propose a decentralized algorithm to estimate gradients with respect to the hyperparameters, which can run even on stochastic and directed communication networks.
Our empirical results demonstrated that the gradient-based bilevel optimization enabled combining existing personalization approaches which led to state-of-the-art performance, confirming it can perform on multiple simulated communication environments including a stochastic and directed network.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: General Machine Learning (ie none of the above)
TL;DR: We propose a gradient-based bilevel optimization as a general approach of personalization and propose a decentralized hyper-gradient estimation altgorithm that runs on stochastic and directed communication networks.
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/personalized-decentralized-bilevel/code)
18 Replies
Loading