Quantifying the Capabilities of LLMs across Scale and Precision

ACL ARR 2024 June Submission1523 Authors

14 Jun 2024 (modified: 06 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Scale is often attributed as one of the factors that cause an increase in the performance of Large Language Models (LLMs), resulting in models with billion and trillion parameters. One of the limitations of such large models is the high computational requirements that limit their usage, deployment, and debugging in resource-constrained scenarios. Two commonly used alternatives to bypass these limitations are to use the smaller versions of LLMs (e.g. Llama 7B instead of Llama 70B) or lower the memory requirements by using quantization. While both approaches effectively address the limitation of resources, their impact on model performance needs thorough examination to make an informed decision. For instance, given a certain memory budget that fits a large model with low precision and a small model with high precision, what would be the right choice that results in good performance? In this study, we aim to answer such questions and investigate the effect of model scale and quantization on the performance using two major families of open-source instruct models. Our extensive zero-shot experiments reveal that larger models generally outperform their smaller counterparts, suggesting that scale remains an important factor in enhancing performance. Moreover, large models show exceptional resilience to precision reduction and serve as a better solution than smaller models at high precision under similar memory requirements.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: NLP in resource-constrained settings, accessible computing, quantization, scaling, democratizing AI
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: Arabic, Chinese, English, French, Indonesian, Japanese, Korean, Spanish, and Vietnamese, Buginese, Sundanese, and Javanese
Submission Number: 1523
Loading