FedPseudo: Privacy-Preserving Pseudo Value-Based Deep Learning Models for Federated Survival Analysis

Md Mahmudur Rahman, Sanjay Purushotham

Published: 16 May 2023, Last Modified: 11 Sept 2024OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Survival analysis, time-to-event analysis, has a wide-ranging impact on patient care. Federated Survival Analysis (FSA) is an emerging Federated Learning (FL) paradigm for performing survival analysis on distributed decentralized data available at multiple medical institutions. FSA can help individual medical institutions (denoted as clients) to obtain better survival predictions on their data while preserving privacy. However, FSA is challenging due to the non-linear and non-IID survival data distributions across clients, moreover, censoring leads to unintentional bias in survival predictions, which may worsen in FSA due to the non-uniform censoring distributions across clients. Recent works have adapted the standard and deep learning-based Cox Proportional Hazards (CoxPH) survival models for FSA; however, none of these works have studied the above FSA challenges systematically. In this paper, we investigate and tackle these important challenges by proposing FedPseudo, a pseudo value-based deep learning framework for FSA. FedPseudo uses deep learning models to learn robust representations from non-linear survival data, leverages the power of pseudo values to handle non-uniform censoring, and uses FL algorithms such as FedAvg for learning model parameters. We introduce a novel yet simple approach to estimate pseudo values for non-IID settings in FSA. We theoretically show that the estimated pseudo values, denoted as federated pseudo values, are consistent, and we empirically demonstrate that they can be computed faster than traditional pseudo value derivation approaches. To ensure and enhance the privacy of both the estimated pseudo values and the shared model parameters, we systematically investigate applying differential privacy (DP) on both the federated pseudo values and FL algorithms. Furthermore, we introduce a novel V-Usable Information metric for survival analysis to quantify how informative a client's data is for training a survival model and use this metric to show the advantage of participating in FSA. Extensive experiments on synthetic and real-world datasets demonstrate that our FedPseudo framework achieves better performance than other FSA approaches and performs similarly to the best centrally trained deep survival model. Moreover, our FedPseudo framework obtains the best results under various censoring settings.