Federated Classification in Hyperbolic Spaces via Secure Aggregation of Convex Hulls

Saurav Prakash; Jin Sima; Chao Pan; Eli Chien; Olgica Milenkovic

Federated Classification in Hyperbolic Spaces via Secure Aggregation of Convex Hulls

Saurav Prakash, Jin Sima, Chao Pan, Eli Chien, Olgica Milenkovic

Published: 16 Jan 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Hierarchical and tree-like data sets arise in many relevant applications, including language processing, graph data mining, phylogeny and genomics. It is known that tree-like data cannot be embedded into Euclidean spaces of finite dimension with small distortion, and that this problem can be mitigated through the use of hyperbolic spaces. When such data also has to be processed in a distributed and privatized setting, it becomes necessary to work with new federated learning methods tailored to hyperbolic spaces. As an initial step towards the development of the field of federated learning in hyperbolic spaces, we propose the first known approach to federated classification in hyperbolic spaces. Our contributions are as follows. First, we develop distributed versions of convex SVM classifiers for Poincar\'e discs. In this setting, the information conveyed from clients to the global classifier are convex hulls of clusters present in individual client data. Second, to avoid label switching issues, we introduce a number-theoretic approach for label recovery based on the so-called integer $B_h$ sequences. Third, we compute the complexity of the convex hulls in hyperbolic spaces to assess the extent of data leakage; at the same time, in order to limit the communication cost for the hulls, we propose a new quantization method for the Poincar\'e disc coupled with Reed-Solomon-like encoding. Fourth, at the server level, we introduce a new approach for aggregating convex hulls of the clients based on balanced graph partitioning. We test our method on a collection of diverse data sets, including hierarchical single-cell RNA-seq data from different patients distributed across different repositories that have stringent privacy constraints. The classification accuracy of our method is up to $\sim11\%$ better than its Euclidean counterpart, demonstrating the importance of privacy-preserving learning in hyperbolic spaces. Our implementation for the proposed method is available at \url{https://github.com/sauravpr/hyperbolic_federated_classification}.

Submission Length: Long submission (more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=umggDfMHha

Changes Since Last Submission: We have switched the prior highlighted text in blue to normal color for the final version of our paper.

Code: https://github.com/sauravpr/hyperbolic_federated_classification

Assigned Action Editor: ~Aditya_Menon1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 1471

Loading