Tesseract: Gradient Flip Score to Secure Federated Learning against Model Poisoning Attacks

Atul Sharma; Wei Chen; Joshua Christian Zhao; Qiang Qiu; Somali Chaterji; Saurabh Bagchi

Tesseract: Gradient Flip Score to Secure Federated Learning against Model Poisoning Attacks

Atul Sharma, Wei Chen, Joshua Christian Zhao, Qiang Qiu, Somali Chaterji, Saurabh Bagchi

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: federated learning, aggregation, security, untargeted model poisoning attack

Abstract: Federated learning—multi-party, distributed learning in a decentralized environment—is vulnerable to model poisoning attacks, even more so than centralized learning approaches. This is because malicious clients can collude and send in carefully tailored model updates to make the global model inaccurate. This motivated the development of Byzantine-resilient federated learning algorithms, such as Krum, Trimmed mean, and FoolsGold. However, a recently developed targeted model poisoning attack showed that all prior defenses can be bypassed. The attack uses the intuition that simply by changing the sign of the gradient updates that the optimizer is computing, for a set of malicious clients, a model can be pushed away from the optima to increase the test error rate. In this work, we develop tesseract—a defense against this directed deviation attack, a state-of-the-art model poisoning attack. TESSERACT is based on a simple intuition that in a federated learning setting, certain patterns of gradient flips are indicative of an attack. This intuition is remarkably stable across different learning algorithms, models, and datasets. TESSERACT assigns reputation scores to the participating clients based on their behavior during the training phase and then takes a weighted contribution of the clients. We show that TESSERACT provides robustness against even an adaptive white-box version of the attack.

One-sentence Summary: How to defend federated learning against local model poisoning attack, the most effective attack known to date, using the pattern of progression of gradients as each client learns.

Supplementary Material: zip

33 Replies

Loading