Keywords: matrix decomposition, federated learning, machine learning, data and feature selection
Abstract: With the advance of federated learning (FL) in privacy-sensitive domains such as healthcare, finance, and mobile intelligence, the need for efficient and robust training becomes increasingly urgent. Communication bottlenecks, heterogeneous client distributions, and fairness requirements make it essential to select the “right” data and features for model training. Yet existing FL research often addresses feature selection and data selection separately, ignoring their interplay in real-world high-dimensional and noisy datasets, leading to suboptimal performance. In this paper, we propose a unified framework for data and feature selection by formulating the problem as a generalized CUR decomposition problem. We introduce FedGCUR, a practical framework that integrates a federated column-pivoted QR (FedCPQR) decomposition routine with per-silo row selection. Specifically, FedCPQR is designed to securely compute a global pivot order without exposing raw data, while FedGCUR leverages this to jointly select shared features and silo-specific samples. We prove that FedCPQR produces exactly the same decomposition results as centralized CPQR and establish an upper bound of the reconstruction error of FedGCUR. Extensive empirical results show that the proposed framework achieves higher accuracy compared to the baselines of data and feature selection methods, demonstrating its effectiveness and efficiency.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 8906
Loading