Keywords: adversarial machine learning, label poisoning attacks, support vector machines, adversarial training, robust classification, bilevel optimization, projected gradient descent, data poisoning, Stackelberg game
Abstract: As machine learning models grow in complexity and increasingly rely on publicly sourced data, such as the human-annotated labels used in training large language models, they become more vulnerable to label poisoning attacks.
These attacks, in which adversaries subtly alter the labels within a training dataset, can severely degrade model performance, posing significant risks in critical applications.
In this paper, we propose $\textbf{Floral}$, a novel adversarial training defense strategy based on support vector machines (SVMs) to counter these threats.
Utilizing a bilevel optimization framework, we cast the training process as a non-zero-sum Stackelberg game between an $\textit{attacker}$, who strategically poisons critical training labels, and the $\textit{model}$, which seeks to recover from such attacks.
Our approach accommodates various model architectures and employs a projected gradient descent algorithm with kernel SVMs for adversarial training.
We provide a theoretical analysis of our algorithm’s convergence properties and empirically evaluate $\textbf{Floral}$'s effectiveness across diverse classification tasks.
Compared to robust baselines and foundation models such as RoBERTa, $\textbf{Floral}$ consistently achieves higher robust accuracy under increasing attacker budgets.
These results underscore the potential of $\textbf{Floral}$ to enhance the resilience of machine learning models against label poisoning threats,
thereby ensuring robust classification in adversarial settings.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4475
Loading