Spike Accumulation Forwarding for Effective Training of Spiking Neural Networks

Published: 21 Jun 2024, Last Modified: 21 Jun 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: In this article, we propose a new paradigm for training spiking neural networks (SNNs), spike accumulation forwarding (SAF). It is known that SNNs are energy-efficient but difficult to train. Consequently, many researchers have proposed various methods to solve this problem, among which online training through time (OTTT) is a method that allows inferring at each time step while suppressing the memory cost. However, to compute efficiently on GPUs, OTTT requires operations with spike trains and weighted summation of spike trains during forwarding. In addition, OTTT has shown a relationship with the Spike Representation, an alternative training method, though theoretical agreement with Spike Representation has yet to be proven. Our proposed method can solve these problems; namely, SAF can halve the number of operations during the forward process, and it can be theoretically proven that SAF is consistent with the Spike Representation and OTTT, respectively. Furthermore, we confirmed the above contents through experiments and showed that it is possible to reduce memory and training time while maintaining accuracy.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: 1. The main content has been increased by one page to 13 pages. 2. We have modified the position of figures and tables to improve readability. 3. To improve readability, we have modified section 2:Related Work and section 4.1. 4. We have written the definition of $l$ after equation 1 in section 3.1. 5. In the definition of the loss above Eq.2: "$\sum_{t =1}^{T} \boldsymbol{s}[t] / T$" -> "${\boldsymbol{a}[T]}$" 6. We have written the full name of SG in the first paragraph under "Online Training Through Time" on page 4. 7. We have added an explanation about OTTT backpropagation after equation 3 in section 3.2. 8. Equation 6: "it hold" -> "it holds". 9. We have explained about the meaning of "essentially identical" in Theorem 2 and Corollary 3. 10. On page 7, after Theorem 2: "${\rm OTTT_A}$" -> "SR" 11. At the beginning of section 4.3, we have added that there are feedback and feedforward connections in the brain. 12. We have added a comparison of accuracy due to differences in surrogate gradient and experiments when labels change from time to time in Future Works. 13. We have added Appendix A: List of main Formulas. 14. We have added the formula for the sigmoid function in Appendix C: Implementation detail. 15. We have added Appendix D: Comparison of gradient. We changed the following after the second comments from the reviewers. 16. We have added more discussion about the equivalence between SAF-E and OTTT_O. 17. We have added a new section "Limitation and Discussion". 18. In Related Work, we mentioned about learning rules based on eligibility traces. 19. In section 3.2 and B.2, we have added more explanation about gradients calculation. 20. In section 3.2, we have added the meaning of "forward connection" and "feedback connection" . Also we have changed from "essentially identical" to "identical up to a scale factor". We changed the following after the minor revision. 21. In the abstract: "...theoretical agreement with Spike Representation has not to be proven." -> "...theoretical agreement with Spike Representation has yet to be proven." 22. In Section 2: "for small time steps" -> "for few time steps". 23. In Section 2: "However, because these methods assume $T \rightarrow \infty$ , $T$ must be large..." -> "However, these methods assume the time step $T \rightarrow \infty$, then $T$ must be sufficiently large..." 24. In Section B.2, equation (16) was changed to an equation one line higher. 25. In Sections B.2 and B.3: "with the SG" -> "with the SG (refer to (30))". Equation (30) is a surrogate gradient (SG) which we used. 26. In Section B.3, we added an explanation of the calculations of $\partial \widehat{\boldsymbol{a}}^{i+1} / \partial \widehat{\boldsymbol{a}}^{i}$ and $\partial \widehat{\boldsymbol{U}}^{i+1} / \partial \widehat{\boldsymbol{a}}^{i}=\boldsymbol{W}^i$ .
Supplementary Material: zip
Assigned Action Editor: ~Blake_Aaron_Richards1
Submission Number: 2343