Keywords: Large Language Model, Memory, Context Compression, In-Context Learning, Fine-Tuning
Abstract: Prompt-based in-context learning (ICL) and parameter fine-tuning are two dominant paradigms for incorporating external information into large language models (LLMs), but they incur high inference cost or expensive retraining; to bridge this gap, context-to-parameter mapping converts prompts into temporary adapter weights. However, we identify a critical failure mode in existing methods: hidden-state collapse, where the adapter-injected model's internal states diverge sharply from the full-context oracle in deeper layers. We trace this failure to two coupled gaps: Input-Selection and Supervision-Signal. To address these gaps, we propose SADA (State-Aligned Distillation Adapters). We justify the attention-block output as a principled feature interface and introduce state-alignment distillation to align hidden states between the adapter-injected model and the full-context oracle. Experiments on long-context language modeling (PG19) and downstream NLU and summarization benchmarks show that SADA consistently improves over StreamAdapter and GenerativeAdapter where applicable, achieving performance comparable to ICL while reducing memory footprint and latency. We further analyze when parameterized context compression is effective and when explicit context retention remains preferable. Our code is available at https://anonymous.4open.science/r/SADA-F924
Paper Type: Long
Research Area: LLM Efficiency
Research Area Keywords: Large Language Model,Memory,Context Compression,In-Context Learning,Fine-Tuning
Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 1760
Loading