The Trichromatic Strong Lottery Ticket Hypothesis: Neural Compression With Three Primary Supermasks

Ángel López García-Arias; Yasuyuki Okoshi; Hikari Otsuka; Daiki Chijiwa; Yasuhiro Fujiwara; Susumu Takeuchi; Masato Motomura

The Trichromatic Strong Lottery Ticket Hypothesis: Neural Compression With Three Primary Supermasks

Ángel López García-Arias, Yasuyuki Okoshi, Hikari Otsuka, Daiki Chijiwa, Yasuhiro Fujiwara, Susumu Takeuchi, Masato Motomura

Published: 09 Oct 2024, Last Modified: 19 Nov 2024Compression Workshop @ NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deep Learning, Neural Networks, Strong Lottery Ticket Hypothesis, Neural Compression

TL;DR: The Strong Lottery Ticket Hypothesis is generalized to a flexible framework of partially random quantization-aware training that offers small and accurate models, evaluated on image classification.

Abstract: The Strong Lottery Ticket Hypothesis (SLTH) demonstrated that a high-performing model can be obtained just by pruning a randomly initialized dense neural network by optimizing a pruning mask, known as a supermask. Supermask accuracy has recently been enhanced by incorporating sign flipping or weight scaling. Furthermore, it has been demonstrated that supermask training can be extended to sparse random networks. This work proposes the Trichromatic Strong Lottery Hypothesis (T-SLTH), a generalization of the SLTH that (1) connects supermasks to quantization-aware training, (2) consolidates all existing supermasks into a single design framework based on three additive primary supermasks, and (3) contains novel supermask types that support arbitrary connectivity. In addition to sparsity and quantization, the partial randomness of supermask-based models provides specialized digital hardware accelerators with a unique opportunity for neural compression. The models offered by the T-SLTH set the SoTA for supermask-based models in accuracy-size tradeoff: a ResNet-$50$ scoring $78.43$% on CIFAR-$100$ can be compressed $38\times$ to $2.51$ MB, or even $144\times$ down to $0.66$ MB while retaining $74.52$% accuracy, and $25\times$ to $4.1$ MB while scoring $75.28$% on ImageNet.

Submission Number: 19

Loading