WideTopo: Improving foresight neural network pruning through training dynamics preservation and wide topologies exploration

Changjian Deng, Jian Cheng, Yanzhou Su, Zeyu An, Zhiguo Yang, Ziying Xia, Yijie Zhang, Shiguang Wang

Published: 2026, Last Modified: 26 Jan 2026Neural Networks 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Foresight neural network pruning methods have garnered significant attention due to their potential to save computational resources. Recent advancements in this field are predominantly categorized into saliency score-based and graph theory-based methods. The former assesses the sensitivity of pruning parameter connections concerning specific metrics, while the latter aims to identify sub-networks characterized by sparse yet highly connected graph structures. However, recent research suggests that relying exclusively on saliency scores may result in deep but narrow sub-networks, while graph theory-based methods may be unsuitable for neural networks requiring pre-trained parameters for initialization, particularly in transfer learning scenarios. We hypothesize that preserving the training dynamics of sub-networks during pruning, along with exploring network structures with wide topology, can facilitate the identification of structurally stable sub-networks with improved post-training performance. Motivated by this, we propose WideTopo, which integrates Neural Tangent Kernel (NTK) theory with Implicit Target Alignment (ITA) in neural networks to capture the training dynamics of sub-networks. Furthermore, it employs a density-aware saliency score decay strategy and a repeated mask restoration strategy to retain more effective nodes, thereby sustaining the width of each layer within the sub-networks. We conducted extensive validations using CNN-based and ViT-based models on representative image classification and semantic segmentation datasets under both random and pre-trained initialization settings. The effectiveness and applicability of our method have been validated on diverse network architectures at various model density rates, showing competitive post-training performance compared with other existing baselines. Our code is publicly available at https://github.com/Memoristor/WideTopo.

External IDs:dblp:journals/nn/DengCSAYXZW26