Client-Level Defense Placement for Adversarially Robust Federated Reinforcement Learning

TMLR Paper7964 Authors

17 Mar 2026 (modified: 26 Apr 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Federated Reinforcement Learning (FRL) extends federated learning to sequential decision-making, enabling multiple clients to collaboratively train a global policy without sharing raw trajectories. While this setting is promising for privacy-sensitive domains such as autonomous systems and IoT control, it introduces critical attack surfaces: adversaries can corrupt policy gradients, and adaptive attackers that reshuffle targets and prioritize high-impact clients render static defenses brittle. Defenses in FRL operate at two complementary layers: server-side aggregation and client-level placement, but the latter remains under-formalized despite directly shaping attacker incentives. We propose FRL-CDPS (\textbf{C}lient-Level \textbf{D}efense \textbf{P}lacement for Adversarially Robust \textbf{F}ederated \textbf{R}einforcement \textbf{L}earning: A \textbf{S}tackelberg Approach), which models budget-constrained client-level defense placement as a Stackelberg game: the defender commits to a protection strategy while a rational Bayesian attacker best-responds under imperfect reconnaissance, maintaining posterior beliefs over each client's defense status. The framework captures partial observability and probabilistic defense effectiveness, faithfully reflecting real-world conditions where defenses are imperfect and adversaries operate under uncertainty. Despite NP-hardness of the defender's bilevel problem, we provide tractable solvers, namely exact feasible-set search for small systems and candidate-based Monte Carlo search for larger ones, with a $\frac{1}{2}$-approximation guarantee for the attacker oracle. Experiments on CartPole-v1, HalfCheetah-v2, and \rev{Walker2d-v5} across seven ablation dimensions show that FRL-CDPS consistently outperforms heuristic client-selection baselines (random, UCB, Thompson sampling) and composes effectively with server-side defenses (FLTG, FedGreed), demonstrating that Stackelberg planning provides a principled and practical advantage for client-level defense in FRL.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Mohammad_Hajiesmaili1
Submission Number: 7964
Loading