FRL-SAGE: Stackelberg Game-Theoretic Defense Against Adaptive Adversaries in Federated Reinforcement Learning

Anish Ambreth; K Naveen Kumar; Mohsen Guizani

FRL-SAGE: Stackelberg Game-Theoretic Defense Against Adaptive Adversaries in Federated Reinforcement Learning

Anish Ambreth, K Naveen Kumar, Mohsen Guizani

17 Sept 2025 (modified: 15 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Federated reinforcement learning, adversarial attacks, defenses, two-player game, Stackelberg game

TL;DR: We model adversarial federated reinforcement learning as a Stackelberg game and propose FRL-SAGE, a framework that optimizes defender strategies to maximize expected utility under dynamic attacks.

Abstract: Federated Reinforcement Learning (FRL) enables multiple agents to collaboratively train policies without sharing raw trajectories, but remains highly vulnerable to adversarial clients. Unlike supervised FL, FRL’s sequential and policy-driven nature allows attackers to adapt strategies across rounds, while defenders must covertly reallocate protections in response. This evolving interaction naturally resembles a two-player strategic game, yet existing defenses assume static adversaries and fail to capture such dynamics. We propose FRL-SAGE (Stackelberg Adversarial Game Equilibrium in Federated Reinforcement Learning), the first framework to formalize attacker–defender dynamics in FRL as a Stackelberg security game. The defender, acting as leader, commits to client-level protections under a budget, while the attacker, as follower, best responds by selecting clients to compromise. We define asymmetric utilities: attacker utility is damage inflicted minus attack cost, while defender utility is the negative sum of residual damage and defense costs. The attacker’s optimization reduces to a 0/1 knapsack problem, solvable via dynamic programming or greedy heuristics, while the defender’s bilevel planning is NP-hard but tractable through exact enumeration or scalable relaxation-based routines. To evaluate the framework concretely, we instantiate an adversary that uses gradient-noise injection and analyze four representative regimes, ranging from static single-client compromise to dynamic multi-client reshuffling with heterogeneous client importance. We theoretically establish equilibrium existence, prove computational hardness, and provide approximation guarantees for scalable solvers. Experiment on CartPole, a standard FRL testbed, illustrate that FRL-SAGE reduces attack-induced performance loss while operating within realistic defense budgets, supporting its role as a principled game-theoretic foundation for proactive defense in adversarial FRL.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 9425

Loading