Abstract: Brain-inspired spiking neural networks (SNNs) are considered energy-efficient alternatives to conventional deep neural networks (DNNs). By adopting event-driven information processing, SNNs can significantly reduce the computational demands associated with DNNs, while still achieving comparable performance. However, current SNNs primarily prioritize high accuracy by constructing complex neuron models that generate sparse spikes. Unfortunately, this approach results in low energy efficiency and high latency, posing a significant challenge for deploying SNNs at the edge. Furthermore, the dominant computation in SNNs, which involves spike-wise Accumulate-Compare operations, is well-suited for Computing-in-Memory (CIM) architectures. However, exploiting high parallel processing and spike sparsity in CIM-based SNN accelerators is challenging due to the irregularity and time dependency of spikes. To address these limitations, the paper proposes COMPASS, a SRAM-based CIM architecture for efficient SNNs. We first introduce an efficient method to exploit irregular sparsity for both input spikes (explicit) and output spikes (implicit). This is achieved through a speculation mechanism that exploit dynamic spike patterns, enabling lean hardware for sparsity utilization. Additionally, the CIM architecture is carefully modified to facilitate dynamic spike pattern generation and exploitation with minimal overhead. Moreover, we design an adaptive dataflow with temporal spike representation tailored for input/output spikes, reducing memory footprint and enabling parallel execution. Comprehensive evaluation results demonstrate that COMPASS can achieve 26.7x end-to-end speedup over recent SNN accelerators hardware implementation with up to 386.7x less energy per inference.
Loading