Toward Energy-Efficient Spike-Based Deep Reinforcement Learning With Temporal Coding

Malu Zhang; Shuai Wang; Jibin Wu; Wenjie Wei; Dehao Zhang; Zijian Zhou; Siying Wang; Fan Zhang; Yang Yang

Toward Energy-Efficient Spike-Based Deep Reinforcement Learning With Temporal Coding

Malu Zhang, Shuai Wang, Jibin Wu, Wenjie Wei, Dehao Zhang, Zijian Zhou, Siying Wang, Fan Zhang, Yang Yang

Published: 01 Jan 2025, Last Modified: 15 May 2025IEEE Comput. Intell. Mag. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deep reinforcement learning (DRL) facilitates efficient interaction with complex environments by enabling continuous optimization strategies and providing agents with autonomous learning abilities. However, traditional DRL methods often require large-scale neural networks and extensive computational resources, which limits their applicability in power-sensitive and resource-constrained edge environments, such as mobile robots and drones. To overcome these limitations, we leverage the energy-efficient properties of brain-inspired spiking neural networks (SNNs) to develop a novel spike-based DRL framework, referred to as Spike-DRL. Unlike traditional SNN-based reinforcement learning methods, Spike-DRL incorporates the energy-efficient time-to-first-spike (TTFS) encoding scheme, where information is encoded through the precise timing of a single spike. This TTFS-based method allows Spike-DRL to work in a sparse, event-driven manner, significantly reducing energy consumption. In addition, to improve the deployment capability of Spike-DRL in resource-constrained environments, a lightweight strategy for quantizing synaptic weights into low-bit representations is introduced, significantly reducing memory usage and computational complexity. Extensive experiments have been conducted to evaluate the performance of the proposed Spike-DRL, and the results show that our method achieves competitive performance with higher energy efficiency and lower memory requirements. This work presents a biologically inspired model that is well suited for real-time decision-making and autonomous learning in power-sensitive and resource-limited edge environments.

Loading