Keywords: scaling laws, large language models, model compression, quantization, sparsity
TL;DR: We investigate new scaling laws which predict the scaling of LLMs when training them over quantized or sparse representations.
Abstract: Scaling laws have shaped recent advances in machine learning by enabling predictable scaling of model performance based on model size, computation, and data volume. Concurrently, the rise in computational cost for AI has motivated model compression techniques, notably quantization and sparsification, which have emerged to mitigate the steep computational demands associated with large-scale training and inference. This paper investigates the interplay between scaling laws and compression strategies, exploring whether a unified scaling framework can accurately predict model performance when training occurs over various compressed representations, such as sparse, scalar-quantized, sparse-quantized or even vector-quantized formats. Our key contributions include proposing and validating a general scaling law formulation applicable both individually but also composably across compression types. We demonstrate both theoretically and empirically that a simple metric based on Gaussian mean squared error fitting can robustly predict parameter efficiency across compressed models. Additionally, we extend our formulation to directly compare the accuracy potential of different compressed formats, and to derive better algorithms for training over sparse-quantized formats. Finally, we identify conditions under which these unified scaling laws fail.
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 15570
Loading