One Spike Decision Reinforcement Learning Framework for Dynamic Environments

14 Apr 2026 (modified: 03 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep reinforcement learning (DRL) agents face challenges in natural environments that are similar to those encountered by biological organisms: they must make actions that are both accurate and timely in response to dynamic, non-stationary conditions. However, achieving such behavior incurs significant computational overhead, limiting the scalability of DRL in real-world applications. Spiking Neural Networks (SNNs), as the most biologically plausible computational model of neurons, offer a promising energy-efficient alternative for reinforcement learning due to their low computational cost. Existing SNN-based methods, however, often rely on multiple simulation time steps to approximate analog activations, which compromises their low-latency and low-power advantages. To address this, we propose a novel DRL framework based on one-spike firing decision (OSFD), which redefines the use of SNNs in DRL. In OSFD, each decision step triggers only a single spike to produce an action, while the residual membrane potential is incrementally accumulated across steps. In addition, we introduce Bayesian variational inference to dynamically regulate the contribution of residual potentials based on state information gain, thereby optimizing policy learning. Experimental results demonstrate that our method not only surpasses conventional artificial neural network (ANN)-based frameworks in performance but also significantly reduces computational cost.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Markus_Heinonen1
Submission Number: 8413
Loading