CATFormer: When Continual Learning Meets Spiking Transformers With Dynamic Thresholds

Vaishnavi N; Kartikay Agrawal; Ayon Borthakur

CATFormer: When Continual Learning Meets Spiking Transformers With Dynamic Thresholds

Vaishnavi N, Kartikay Agrawal, Ayon Borthakur

16 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Spiking neural networks, Lifelong learning, Transformer-based SNNs, Neuromorphic chips, Spiking threshold

TL;DR: A spiking transformer with dynamic thresholds for scalable class incremental learning.

Abstract: Although deep neural networks perform extremely well in controlled environments, they fail in real-world scenarios where the data isn't available all at once, and the model requires an update to adapt itself to the new data distribution, which might or might not follow the initial distribution. Previously acquired knowledge is lost during such subsequent updates from new data, a phenomenon commonly known as catastrophic forgetting. In contrast, the brain can learn without such catastrophic forgetting, irrespective of the number of tasks it encounters. Existing spiking neural networks (SNNs) for class-incremental learning (CIL) suffer a sharp performance drop as tasks accumulate. While Parameter-Efficient Fine-Tuning (PEFT) strategies have significantly mitigated this in non-spiking vision transformers by adapting minimal parameters but equivalent mechanisms in the spiking domain remain underexplored. We here introduce CATFormer (Context Adaptive Threshold Transformer), a scalable framework that overcomes this limitation. We observe that the key to preventing forgetting in SNNs lies not only in synaptic plasticity, but also in modulating neuronal excitability. At the core of CATFormer is the \textit{Dynamic Threshold Leaky Integrate-and-Fire (DTLIF)} neuron model, which leverages context-adaptive thresholds as the primary mechanism for knowledge retention. This is paired with a Gated Dynamic Head Selection (G-DHS) mechanism for inference. Extensive evaluation on both static (CIFAR 10/100/Imagenet 100/Tiny-Imagenet 200) and neuromorphic (CIFAR10-DVS/SHD) datasets reveals that CATFormer outperforms existing rehearsal-free CIL algorithms across various task splits, establishing it as an ideal architecture for energy-efficient and class incremental learning.

Primary Area: applications to neuroscience & cognitive science

Submission Number: 7420

Loading