Federated Learning of Sparse Gaussian Processes

Federated Learning of Sparse Gaussian Processes

TMLR Paper1905 Authors

05 Dec 2023 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Gaussian processes (GPs) are widely used flexible nonparametric probabilistic models, and sparse variational approximations for GPs (sparse GPs) have emerged as the go-to method for addressing their poor computational efficiency. In many applications in which we would like to use sparse GPs, datasets are distributed across multiple clients and data privacy is often a concern. This motivates the use of federated learning algorithms, which enable clients to train a model collaboratively without centralising data. Partitioned variational inference (PVI) is an established framework for communication-efficient federated learning of variational approximations. However, we show that PVI cannot support sparse GPs due to the need to share and learn variational parameters (the inducing point locations) across clients. Hence, we re-frame inducing points in sparse GPs as auxiliary variables in a hierarchical variational model (HVM). We use this reformulation to extend PVI to variational distributions with shared variational parameters across client-specific factors, enabling communication-efficient federated learning of inducing points. In addition, we develop a novel parameterisation of the variational distribution which, when combined with the HVM formulation of inducing points, improves the communication efficiency and quality of learning. Our experiments show that our method significantly outperforms baseline approaches for federated learning of sparse GPs on a number of real-world regression tasks.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: 1. Inclusion of related work section on federated learning. 2. Updated related work section on federated learning of GPs, including discussion on Moreno-Munoz et al. (2021). 3. Inclusion of related work section on probabilistic treatment of inducing locations. 4. Discussion on the similarities and differences between PVI and traditional federated learning algorithms, such as FedAvg, at the end of Section 3. 5. Refined discussion of the limitations of PVI when applied to variational approximations in which some variational parameters are shared across clients. (beneath Equation 3) 6. Emphasis on the difference between the DPO parameterisation which we develop, and other, similar parameterisations developed by others. (end of Section 5.1) 7. Explanation as to why the experiments are restricted to just GP models. (top of Section 6)

Assigned Action Editor: ~Sebastian_U_Stich1

Submission Number: 1905

Loading