Angel or Demon: Investigating the Plasticity-Enhanced Strategies' Impact on Backdoor Threats in Deep Reinforcement Learning

Oubo Ma; Ruixiao Lin; Yang Dai; Jiahao Chen; Chunyi Zhou; Linkang Du; Shouling Ji

Angel or Demon: Investigating the Plasticity-Enhanced Strategies' Impact on Backdoor Threats in Deep Reinforcement Learning

Oubo Ma, Ruixiao Lin, Yang Dai, Jiahao Chen, Chunyi Zhou, Linkang Du, Shouling Ji

13 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: deep reinforcement learning, backdoor attacks, plasticity

Abstract: Deep Reinforcement Learning (DRL) faces significant threats from backdoor attacks, as indicated by numerous studies. However, these studies are conducted under idealized scenarios and overlook the existence of intervention strategies that are becoming indispensable built-in components of DRL agents for mitigating plasticity loss. Such discrepancies may lead to misperceptions regarding the severity and nature of DRL backdoor attacks. To bridge this gap, we investigate three research questions: (1) How do interventions impact backdoor attacks in DRL? (2) What are the intrinsic mechanisms underlying these impacts? (3) What implications do these intrinsic mechanisms hold for future research? To answer these questions, we empirically study 14,664 cases covering representative interventions and attack scenarios. The results show that, particularly in the post-training scenario, *SAM* exacerbates the backdoor threat, whereas other interventions exert varying degrees of mitigation. These impacts arise from three intrinsic mechanisms, including disrupting activation pathways (corresponding interventions such as *Shrink \& Perturb*, *Weight Clipping*, and *ReDo*), compressing representation space (such as *Spectral Normalization*, *Weight Decay*, and *Layer Normalization*), and capturing sharp losses (such as *SAM*).} Notably, we reveal that interventions with different mechanisms, applied in combination, alter the internal properties of backdoors and enable robust backdoor injection. Based on this insight, we propose the conceptual framework *Scavenger-Converter-Connector* (*SCC*).} Meanwhile, we observe that abnormal loss landscape sharpness emerges as a prominent external manifestation of DRL backdoors, which constitutes a potentially critical insight for backdoor detection.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 4742

Loading