Bonsai Networks: Structured Pruning and Sparse Training of Foundation Models

Ayoub Ghriss; Claire Monteleoni; Stephen Becker

Bonsai Networks: Structured Pruning and Sparse Training of Foundation Models

Ayoub Ghriss, Claire Monteleoni, Stephen Becker

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: optimization, soft-thresholding, sparse training, structured sparsity

TL;DR: An adaptive regularization that finds a minimizer that has specific sparsity structure, allowing pruning of foundation models.

Abstract: The recent trend of scaling neural networks to unprecedented sizes demands efficient structured sparsity for practical deployment, yet precise control of sparsity levels and patterns for hardware acceleration remains challenging. This paper introduces the Adaptive Soft-Thresholding Algorithm (ASTRA), which achieves a target sparsity by adapting group-wise regularization strength based on computationally inexpensive sparsity characterizations. We establish ASTRA’s theoretical foundations, proving the existence of stable regularizations that realize the desired sparsity. We demonstrate sublinear and linear convergence rates for both the model parameters and the regularization weight in deterministic settings and, crucially, an almost sure $O(1/t)$ convergence rate in the practical stochastic-gradient setting. ASTRA provides a theoretically grounded method for direct, precise control over structured sparsity, enabling the pruning and fine-tuning of foundation models into Bonsai Networks: accelerator-friendly miniatures trained to match the teacher’s outputs while preserving downstream performance.

Supplementary Material: zip

Primary Area: optimization

Submission Number: 22992

Loading