Brain-inspired sparse training enables Transformers and LLMs to perform as fully connected

Published: 05 Mar 2025, Last Modified: 10 Apr 2025SLLMEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 4 pages)
Keywords: dynamic sparse training, network science, epitopological Learning, efficient training
Abstract: This study aims to enlarge our current knowledge of the application of brain-inspired network science principles for training artificial neural networks (ANNs) with sparse connectivity. The Cannistraci-Hebb training (CHT) is a brain-inspired method for growing connectivity in dynamic sparse training (DST). CHT leverages a gradient-free, topology-driven link regrowth mechanism, which has been shown to achieve ultra-sparse (1\% connectivity or lower) advantage across various tasks compared to fully connected networks. Yet, CHT suffers two main drawbacks: high time complexity of the link predictor and easy stack into the epitopological local minima. Here, we propose a matrix multiplication GPU-friendly approximation of the CH link predictor, which reduces the computational complexity to $\mathcal{O}(N^3)$, enabling a fast implementation of CHT in large-scale models. Moreover, we introduce the **C**annistraci-**H**ebb **T**raining **s**oft rule (CHTs), which adopts a flexible strategy for sampling connections in both link removal and regrowth, balancing the exploration and exploitation of network topology. To further improve performance, we integrate CHTs with a **s**igmoid gradual density decay strategy, referred to as CHTss. Empirical results show that 1) using 5\% of the connections, CHTss outperforms fully connected networks in two Transformer-based machine translation tasks; 2) using 30\% of the connections, CHTss achieves superior performance compared to other dynamic sparse training methods in language modeling (LLaMA-130M) across different sparsity levels, and it surpasses the fully connected counterpart in zero-shot evaluations.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 76
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview