Abstract: Federated learning is a framework for training machine learning models from clients with multiple local data sets without access to the data in its aggregate. Instead, a shared model is jointly learned through an interactive process between a centralized server that combines locally learned model gradients or weights from the client. However, the lack of data transparency naturally raises concerns about model security. Recently, several state-of-the-art backdoor attacks have been proposed, which achieve high attack success rates while simultaneously being difficult to detect, leading to compromised federated learning models. In this paper, motivated by differences in the logits of models trained with and without the presence of backdoor attacks, we propose a defense method that can prevent backdoor attacks from influencing the model while maintaining the accuracy of the original classification task. TAG leverages a small validation data set to estimate the most considerable change a benign client's local training can make to the shared model, which can be used to filter clients from updating the shared model. Experimental results on multiple data sets show that TAG defends against backdoor attacks even when 40 percent of user submissions to update the shared model are malicious.
Submission Length: Regular submission (no more than 12 pages of main content)
Code: https://github.com/JoeLavond/TrustedAggregation
Assigned Action Editor: ~bo_han2
Submission Number: 2918
Loading