CLIQ: On-Device Large Language Models Extraction

CLIQ: On-Device Large Language Models Extraction

ACL ARR 2026 January Submission5385 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: On-device LLMs, knowledge distillation, query efficiency, quantization, edge AI

Abstract: Large language models (LLMs) are increasingly deployed on edge devices under strict computation, memory, and quantization constraints. In such settings, extracting or distilling knowledge from heavily quantized on-device LLMs poses a fundamentally different challenge from conventional cloud-based distillation, due to limited query budgets and amplified quantization noise. We propose CLIQ (Clustered Instruction Querying), a query-efficient distillation framework designed for extracting knowledge from quantized on-device LLMs. CLIQ explicitly models the semantic structure of the instruction space by clustering queries and generating a compact set of cluster-aware, representative instructions, thereby improving semantic coverage while reducing redundancy. Extensive experiments on quantized Qwen-family models under INT8 and INT4 settings show that, under identical query budgets, CLIQ consistently outperforms original query sampling across BERTScore, BLEU, and ROUGE metrics. Our results demonstrate that structured, semantically representative supervision is critical for effective distillation of edge-oriented language models.

Paper Type: Long

Research Area: Low-resource Methods for NLP

Research Area Keywords: quantization, distillation, data-efficient training, LLM efficiency, NLP in resource-constrained settings

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 5385

Loading