Approaching the Harm of Gradient Attacks While Only Flipping Labels

TMLR Paper7176 Authors

26 Jan 2026 (modified: 06 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Machine learning systems deployed in distributed or federated environments are highly susceptible to adversarial manipulations, particularly availability attacks -rendering the trained model unavailable. Prior research in distributed ML has demonstrated such adversarial effects through the injection of gradients or data poisoning. In this study, we aim to better understand the potential of weaker (action-wise) adversaries by asking: Can availability attacks be inflicted solely through the flipping of a subset of training labels, without altering features, and under a strict flipping budget? We analyze the extent of damage caused by constrained label flipping attacks against federated learning under mean aggregation—the dominant baseline in research and production. Focusing on a distributed classification problem, (1) we propose a novel formalization of label flipping attacks on logistic regression models and derive a greedy algorithm that is provably optimal at each training step. (2) To demonstrate that availability attacks can be approached by label flipping alone, we show that a budget of only $0.1\%$ of labels at each training step can reduce the accuracy of the model by $6\%$, and that some models can perform worse than random guessing when up to $25\%$ of labels are flipped. (3) We shed light on an interesting interplay between what the attacker gains from more \emph{write-access} versus what they gain from more \emph{flipping budget}. (4) We define and compare the power of targeted label flipping attack to that of an untargeted label flipping attack.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Pin-Yu_Chen1
Submission Number: 7176
Loading