LLM-based Affective Text Generation Quality Based on Different Quantization Values

ACL ARR 2024 June Submission4941 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models in Natural Language Processing (NLP) exhibit a remarkable capacity for tasks such as language generation, translation, and contextual comprehension. In order to achieve these results, the model uses a large number of parameters, requiring a significant number of computational resources for training or utilization. Reducing the precision bits makes the models smaller, resulting in fewer computational resources needed to use the models, at the cost of lowering overall accuracy. This paper addresses the trade-off between different quantization values, GPU RAM utilization, and text quality in affective text generation (e.g., ``I really enjoy running in the snow-covered forest''). To evaluate, we use an emotion classifier and ten seed prompts to generate affective text. We test three setups of precision bits (8, 16, and 32) across two open weight language models. Our findings demonstrate that bit reductions leads to memory savings, achieving a reduction of 76 \%. However, this optimization comes with a trade-off, leading to a decrease of up to 10 pp in F$_1$ score for larger models and an increase of 10 pp for smaller models, along with roughly double the inference time. In terms of text quality, larger models at lower quantization levels generally outperform smaller, higher-precision models -- while requiring similar memory.
Paper Type: Short
Research Area: Sentiment Analysis, Stylistic Analysis, and Argument Mining
Research Area Keywords: Affective text generation, quantization
Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 4941
Loading