Abstract: LLMs are widely used nowadays by several enterprises for various use cases. This is due to their general applicability and
demonstrated success across multiple domains and tasks. However, there is a monetary cost associated with the use of
commercially available inference APIs to LLMs. This cost generally depends on the number of input and output tokens
and the cost parameters of the provider.
In this work, we propose a framework QReT for reducing the input token count in prompts in a controllable quality aware manner. QReT first paraphrases the prompt to reduce token counts while maintaining quality
measures. Secondly, it applies certain heuristics, again a controlled manner to reduce the final token count, without affecting the understanding by LLMs (hence, the output quality). We empirically validate QReT across several datasets and
tasks and show its effectiveness.
Paper Type: Long
Research Area: Special Theme (conference specific)
Research Area Keywords: Token Optimization, Paraphrasing
Contribution Types: Approaches to low-resource settings
Languages Studied: English
Submission Number: 5888
Loading