Abstract: Highlights•We are the first to use the frozen LLM itself to compress over-limit prompts.•We achieve a balance among training cost, inference efficiency, and response quality.•Our method is more general and cost-efficient than existing compression methods.
Loading