Keywords: Pruning; conditional computation; learning theory
TL;DR: We analytically demonstrate why pruning and conditional computation perform so well, even at high sparsity rates.
Abstract: We analyze the processes of pruning and conditional computation for the case of a single neuron in the asymptotic learning regime of large input dimension and training set size. For this purpose, we introduce conditional neurons, which implement an early exit strategy at the neuron level. Specifically, a conditional neuron considers the local field induced by a subset of its inputs. If this sub-local field is strong enough, then the rest of the inputs are ignored, saving computation. Conditional neurons provide an archetype of the well-known early exit or conditional computation architectures. As such, we formally analyze their generalization performance to understand why conditional computation is so effective in preserving performance despite significantly reduced average amount of computation. In the process, we introduce a concentration theorem for one-shot neuron-wise pruning, which is recently popularized in the context of large language models.
Student Paper: No
Submission Number: 83
Loading