SelfCP: Compressing over-limit prompt via the frozen large language model itself

Published: 01 Jan 2024, Last Modified: 14 Nov 2024Inf. Process. Manag. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We are the first to use the frozen LLM itself to compress over-limit prompts.•We achieve a balance among training cost, inference efficiency, and response quality.•Our method is more general and cost-efficient than existing compression methods.
Loading