Track: long paper (up to 10 pages)
Keywords: residual stream, context compression, retrieval-augmented generation, activation injection, prompt compression, training-free, context amortization, long-context inference
Abstract: We propose Residual Stream Context Encoding (RSCE), a training-free method that
eliminates redundant long-context prefill costs in retrieval-augmented generation.
Given a context document ctx, RSCE extracts a vector C ∈ RdM by mean-
pooling residual stream activations at a calibrated intermediate layer f (M ), then
injects it as an additive shift at query time—replacing O(|T (ctx)|) attention prefill
with an O(1) operation with zero per-query context forward pass. For tasks
requiring factual precision, we pair C with a compact explicit fact block F , forming
a dual-channel representation amortized across N ≥ 2 queries. We evaluate
five decoder-only architectures (7B–70B) on multi-document QA (LongBench,
n = 108) and six architectures on cross-file code completion (RepoBench-C),
comparing against LongLLMLingua and EHPC. A key mechanistic finding: vector
injection alone suppresses parametric recall below the question-only baseline—a
dual-pathway interference effect absent in behavioral steering that motivates the
dual-channel design. At extreme compression (∼99% token reduction), RSCE
Vec+F is competitive with EHPC on smaller architectures (LLaMA-8B F1 0.333
vs. EHPC 0.334; DeepSeek-14B both 0.214) while both substantially outperform
LongLLMLingua (0.209, 0.172). On larger models, EHPC’s capacity-scaling
token selection widens the gap, reaching F1 0.539 vs. RSCE 0.365 on LLaMA-
70B—a finding we explain through model capacity scaling of in-context reasoning.
On RepoBench-C, LongLLMLingua substantially improves over baseline via
compression-as-retrieval; RSCE is the only method achieving 81% compression
at 100% operational reliability.
Presenter: ~Eric_Xu2
Format: No, the presenting author is unable to, or unlikely to be able to, attend in person.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 198
Loading