Persistent Backdoor Attacks in Class-Incremental Learning via Structural Invariant Anchoring

Junhuang Huang; Linshan Hou; Jianting Ning; Yanjun Zhang; Zhongyun Hua; Leo Yu Zhang

Persistent Backdoor Attacks in Class-Incremental Learning via Structural Invariant Anchoring

Junhuang Huang, Linshan Hou, Jianting Ning, Yanjun Zhang, Zhongyun Hua, Leo Yu Zhang

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We show that AI systems that keep learning from new data can remain vulnerable to hidden attacks planted early in training, and we use this finding to help guide stronger defenses for long-term learning systems.

Abstract: Continual learning (CL) involves continual parameter updates, posing a significant challenge to backdoor persistence. In this paper, we reveal that the most advanced existing attack relies on an implicit assumption that task-critical neurons remain stable across task learning; however, this assumption does not hold in class-incremental learning (CIL). This exposes a critical research gap: backdoor persistence in CIL remains an open question. Inspired by functional stability, we discover that CIL models preserve task knowledge in shallow, structurally invariant subspaces. Motivated by these findings, we propose PBTO, the first persistent and targeted backdoor attack in CIL. PBTO trains a surrogate model on proxy tasks to obtain a parameter trajectory. It then optimizes a universal trigger that ensures misclassification to the target label across all model states and anchors trigger embeddings in shallow layers. Experimental results verify that PBTO maintains a high final attack success rate (ASR) across all benchmarks, while representative baselines degrade substantially after sequential learning. Code is available at \href{https://github.com/hjhkkkc/PBTO}{PBTO}.

Lay Summary: Artificial intelligence systems are increasingly updated over time as new data becomes available. This makes them useful, but it also creates new security risks: an attacker may hide a small number of manipulated examples in early training data so that the system later makes a specific wrong prediction when it sees a hidden cue. We study why such hidden attacks usually fade away when the system keeps learning new tasks, and why some parts of the system remain more stable than others. Based on this finding, we design a stronger test attack that stays effective even after many later updates. Our results show that current defenses may not be enough for AI systems that are repeatedly retrained. By exposing this risk, our work aims to help researchers build safer long-term learning systems and better defenses before such attacks are used in real applications.

Primary Area: Deep Learning->Theory

Keywords: Backdoor Attacks, Continual Learning, Data Poisoning

Originally Submitted PDF: pdf

Submission Number: 34482

Loading