Brain-inspired sparse training enables Transformers and LLMs to perform as fully connected

Published: 08 Jan 2025, Last Modified: 24 Jan 2026OpenReview Archive Direct UploadEveryoneCC BY-ND 4.0
Abstract: This study aims to enlarge our current knowledge of the application of brain-inspired network science principles for training artificial neural networks (ANNs) with sparse connectivity. The Cannistraci-Hebb training (CHT) is a brain-inspired method for growing connectivity in dynamic sparse training (DST). CHT leverages a gradient-free, topology-driven link regrowth mechanism, which has been shown to achieve ultra-sparse (1% connectivity or lower) advantage across various tasks compared to fully connected networks. Yet, CHT suffers two main drawbacks: high time complexity of the link predictor and easy stack into the epitopological local minima. Here, we propose a matrix multiplication GPU-friendly approximation of the CH link predictor, which reduces the computational complexity to , enabling a fast implementation of CHT in large-scale models. Moreover, we introduce the Cannistraci-Hebb Training soft rule (CHTs), which adopts a flexible strategy for sampling connections in both link removal and regrowth, balancing the exploration and exploitation of network topology. To further improve performance, we integrate CHTs with a sigmoid gradual density decay strategy, referred to as CHTss. Empirical results show that 1) using 5% of the connections, CHTss outperforms fully connected networks in two Transformer-based machine translation tasks; 2) using 30% of the connections, CHTss achieves superior performance compared to other dynamic sparse training methods in language modeling (LLaMA-130M) across different sparsity levels, and it surpasses the fully connected counterpart in zero-shot evaluations.
Loading