QReT: Quality Aware Token Count Reduction

QReT: Quality Aware Token Count Reduction

ACL ARR 2024 June Submission5888 Authors

16 Jun 2024 (modified: 19 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: LLMs are widely used nowadays by several enterprises for various use cases. This is due to their general applicability and demonstrated success across multiple domains and tasks. However, there is a monetary cost associated with the use of commercially available inference APIs to LLMs. This cost generally depends on the number of input and output tokens and the cost parameters of the provider. In this work, we propose a framework QReT for reducing the input token count in prompts in a controllable quality aware manner. QReT first paraphrases the prompt to reduce token counts while maintaining quality measures. Secondly, it applies certain heuristics, again a controlled manner to reduce the final token count, without affecting the understanding by LLMs (hence, the output quality). We empirically validate QReT across several datasets and tasks and show its effectiveness.

Paper Type: Long

Research Area: Special Theme (conference specific)

Research Area Keywords: Token Optimization, Paraphrasing

Contribution Types: Approaches to low-resource settings

Languages Studied: English

Submission Number: 5888

Loading