Unifying Demonstration Selection and Compression for In-Context Learning

Unifying Demonstration Selection and Compression for In-Context Learning

ACL ARR 2024 June Submission3079 Authors

15 Jun 2024 (modified: 09 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In-context learning (ICL) enables LLMs to exhibit spectacular emergent capabilities in various scenarios. Unfortunately, introducing demonstrations easily makes the prompt length explode, bringing a significant burden to hardware. In addition, random demonstrations usually achieve limited improvements in ICL, necessitating demonstration selection among accessible candidates. Previous studies introduce extra modules to perform demonstration compression or selection independently. In this paper, we propose an ICL framework UniICL, which \textbf{Uni}fies demonstration selection and compression, and final response generation via a single frozen LLM. UniICL leverages the understanding ability of well-trained LLMs to independently compress different demonstrations into compressed features, and then a learnable projection layer converts features to LLM-acceptable compressed virtual tokens. Apart from substituting original demonstrations to reduce input length, virtual tokens are again used to select potential demonstrations. candidate demonstrations and inference input. Finally, current queries together with selected compressed virtual tokens are fed into the same frozen LLM for response generation. UniICL is a parameter-efficient framework that only contains 17M trainable parameters originating from the projection layer and a learnable embedding. We build UniICL upon two backbones and conduct experiments over in- and out-domain datasets of both generative and understanding tasks, encompassing ICL scenarios with plentiful and limited demonstration candidates. Results show that UniICL effectively unifies $12 \times$ compression, demonstration selection, and response generation, efficiently scaling up the baseline from 4-shot to 64-shot ICL with 24 GB CUDA allocation\footnote{The code and model will be released in the final version.}.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: Efficient/Low-Resource Methods for NLP

Languages Studied: English

Submission Number: 3079

Loading