Cannistraci-Hebb Training with N:M Semi-Structured Sparsity for Pre-Training and Re-Training

Jiaqing Lyu; Ruijie Wang; Kangyou Bao; Yingtao Zhang; Carlo Vittorio Cannistraci

Cannistraci-Hebb Training with N:M Semi-Structured Sparsity for Pre-Training and Re-Training

Jiaqing Lyu, Ruijie Wang, Kangyou Bao, Yingtao Zhang, Carlo Vittorio Cannistraci

Published: 22 Jan 2026, Last Modified: 06 Mar 2026CPAL 2026 (Proceedings Track) PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Dynamic Sparse Training; Semi-Structured Sparsity; LLM; ViT

Abstract: Sparse training offers a pivotal pathway for scaling deep learning efficiency, replacing dense networks with sparse counterparts that maintain competitive performance using significantly fewer parameters. While brain-inspired sparse training methods like Cannistraci-Hebb Training (CHT) have shown great promise, they typically rely on unstructured sparsity, failing to exploit the acceleration capabilities of modern GPU architectures. Conversely, NVIDIA’s N:M semi-structured sparsity has emerged as a standard for hardware-efficient acceleration. However, the existing N:M training methods always rely on straight-through estimators (STE) and need to maintain dense weights, which do not constitute true sparse training. In this work, we bridge the gap between dynamic sparse training and hardware efficiency. We make three primary contributions: (1) We introduce CHTs24, the first framework to integrate Cannistraci-Hebb Training with 2:4 semi-structured sparsity. This approach outperforms strong baselines (e.g., SR-STE) in training linear layers within Large Language Models (LLMs). (2) We propose the epi-topology Dynamic Sparse re-Training (eDSrT) pipeline, a novel methodology for transitioning dense models to semi-structured sparsity. (3) We demonstrate the efficacy of this pipeline by adapting CHTs24 to prune and retrain a Vision Transformer (ViT) into 2:4 sparsity in just 100 epochs with negligible performance loss. Collectively, our research presents a synergistic, hardware-friendly approach to advancing sparse training for large-scale neural networks.

Submission Number: 33

Loading