- TL;DR: We investigate the internal reasons of our observations, the diminishing effects of the well-known hyperparameter optimization methods on federated learning from decentralized non-IID data.
- Abstract: Federated learning, where a global model is trained by iterative parameter averaging of locally-computed updates, is a promising approach for distributed training of deep networks; it provides high communication-efficiency and privacy-preservability, which allows to fit well into decentralized data environments, e.g., mobile-cloud ecosystems. However, despite the advantages, the federated learning-based methods still have a challenge in dealing with non-IID training data of local devices (i.e., learners). In this regard, we study the effects of a variety of hyperparametric conditions under the non-IID environments, to answer important concerns in practical implementations: (i) We first investigate parameter divergence of local updates to explain performance degradation from non-IID data. The origin of the parameter divergence is also found both empirically and theoretically. (ii) We then revisit the effects of optimizers, network depth/width, and regularization techniques; our observations show that the well-known advantages of the hyperparameter optimization strategies could rather yield diminishing returns with non-IID data. (iii) We finally provide the reasons of the failure cases in a categorized way, mainly based on metrics of the parameter divergence.
- Code: https://github.com/fl-noniid/fl-noniid
- Keywords: Federated learning, Iterative parameter averaging, Deep networks, Decentralized non-IID data, Hyperparameter optimization methods