Abstract: In real clinics, the medical data are scattered over multiple hospitals. Due to security and privacy concerns, it is almost impossible to gather all the data together and train a unified model. Therefore, multi-node machine learning systems are currently the mainstream form of model training in healthcare systems. Nevertheless, distributed training relies on the exchange of gradients, which has been proved under the risk of privacy leakage. That means malicious attackers can restore the user's sensitive data by utilizing the publicly shared gradients, which is a serious problem for extremely private data such as Electronic Healthcare Records (EHRs). The performance of the previous gradient attack method will drop rapidly when the batch size of training data increases, which makes it less threatening in practice. However, in this paper, we found in the medical domain, by leveraging prior knowledge like the medical knowledge graph, the leakage risk can be significantly amplified. In particular, we present GraphLeak, which incorporates the medical knowledge graph in gradient leakage attacks. GraphLeak can improve the restoration effect of gradient attacks even under large batches of data. We conduct experimental verification on electronic healthcare record datasets, including eICU and MIMIC-III. Our method has achieved state-of-the-art attack performance compared with previous works. Code is available at https://github.com/anonymous4ai/GraphLeak.
Loading