Federated, Fast, and Private Visualization of Decentralized Data
Keywords: Federated visualization, Differential privacy, quality control, MRI
TL;DR: A federated, fast, and private visualization technique for the quality control of decentralized data
Abstract: Data visualization is an important step in many machine learning applications, as it allows for detecting outliers and discovering latent structure within data samples. In high-dimensional settings, visualization can be performed by embedding the samples into a low-dimensional space. There are several existing methods that do this embedding efficiently, but many of them rely on the assumption that all the data are locally available. In order to use such methods in a distributed setting, one would have to pool all of the datasets into a single site. However, in many domains, communication overhead and privacy concerns often preclude aggregating data from different data sources. To overcome this issue, we previously proposed decentralized Stochastic Neighbouring Embedding (dSNE), where one can embed high-dimensional data to a low-dimensional space in a decentralized manner. Yet, the dSNE algorithm still presents a couple challenges. Since dSNE communicates in an iterative manner, communication overhead may still be high. In addition, privacy is not formally guaranteed. In this paper, we introduce Faster AdaCliP dSNE (F-dSNE) that reduces communication among sites while satisfying $(\epsilon, \delta)$-differential privacy. Our experiments on four multi-site neuroimaging datasets demonstrate that we can still obtain promising results while addressing these remaining challenges.
Submission Number: 59