Keywords: federated learning, distributed, privacy, co-training
TL;DR: We improve privacy in federated learning by replacing model averaging with co-training on a public unlabeled dataset.
Abstract: Federated learning offers collaborative training among distributed sites without sharing sensitive local information by sharing the sites' model parameters. It is possible, though, to make non-trivial inferences about sensitive local information from these model parameters. We propose a novel co-training technique called AIMHI that uses a public unlabeled dataset to exchange information between sites by sharing predictions on that dataset. This setting is particularly suitable to healthcare, where hospitals and clinics hold small labeled datasets with highly sensitive patient data and large national health databases contain large amounts of public patient data. We show that the proposed method reaches a model quality comparable to federated learning while maintaining privacy to high degree.
Is Student: Yes