Training with Dynamic Sparse Heads as the Key to Effective Ensembling

Training with Dynamic Sparse Heads as the Key to Effective Ensembling

ICLR 2026 Conference Submission13844 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: deep learning, ensembles, sparsity, dynamic sparse training, computer vision, language modeling

TL;DR: We propose NeuroTrails, a sparse multi-head architecture with dynamically evolving topology, which outperforms full ensembles by inducing diverse prediction paths, while using a fraction of the resources.

Abstract: Model ensembles have long been a cornerstone for improving generalization and robustness in deep learning. However, their effectiveness often comes at the cost of substantial computational overhead. To address this issue, state-of-the-art methods aim to replicate ensemble-class performance without requiring multiple independently trained networks. Unfortunately, these algorithms often still demand considerable compute at inference. In response to these limitations, we introduce _NeuroTrails_, a sparse multi-head architecture with dynamically evolving topology. This unexplored model-agnostic training paradigm improves ensemble performance while reducing the required parameter count. We analyze the underlying reason for its effectiveness and observe that the various neural trails induced by dynamic sparsity attain a _Goldilocks zone_ of prediction diversity. NeuroTrails displays efficacy with convolutional and transformer-based architectures on vision, language, and reinforcement learning tasks. Experiments on ResNet-50/ImageNet, LLaMA-350M/C4, DQN/Atari demonstrate increased performance and stronger robustness in zero-shot generalization, while requiring significantly fewer resources.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 13844

Loading