CIFLEX: Contextual Instruction Flow for Sub-task Execution in Multi-Turn Interactions with a Single On-Device LLM

CIFLEX: Contextual Instruction Flow for Sub-task Execution in Multi-Turn Interactions with a Single On-Device LLM

ACL ARR 2025 May Submission465 Authors

13 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We present CIFLEX (Contextual Instruction FLow with EXecution), a novel execution system for efficient sub-task handling in multi-turn interactions with a single on-device large language model (LLM). As LLMs become increasingly capable, a single model is expected to handle diverse sub-tasks that more effectively and comprehensively support answering user requests. Naive approach reprocesses the entire conversation context when switching between main and sub-tasks (e.g., query rewriting, summarization), incurring significant computational overhead. CIFLEX mitigates this overhead by reusing the key-value (KV) cache from the main task and injecting only task-specific instructions into isolated side paths. After sub-task execution, the model rolls back to the main path via cached context, thereby avoiding redundant prefill computation. To support sub-task selection, we also develop a hierarchical classification strategy tailored for small-scale models, decomposing multi-choice decisions into binary ones. Experiments show that CIFLEX significantly reduces computational costs without degrading task performance, enabling scalable and efficient multi-task dialogue on-device.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: On-device,Multi-turn conversation,Efficient KV cache,Side-tasking

Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency, Data resources

Languages Studied: English

Keywords: Edge device LLM, KV Cache, Multi Turn Conversation, Efficient Multi Task Processing

Submission Number: 465

Loading