Shorter is Better: Extreme Compression Outperforms Medium Prompts, and RLHF Causes 598% Constraint Degradation
Keywords: instruction following, prompt compression, constraint compliance, RLHF, benchmark, LLM evaluation, alignment, robustness, AI safety
TL;DR: We prove extreme compression (2 words) outperforms medium prompts (27 words) by 52% while costing 93% less, and RLHF helpfulness causes 598% constraint degradation—revealing a fundamental efficiency-safety paradox in deployed LLMs.
Abstract: Large language models (LLMs) exhibit degraded performance under prompt compression, but the mechanisms remain poorly understood. We introduce the Compression-Decay Comprehension Test (CDCT), a benchmark that independently measures constraint compliance (CC) and semantic accuracy (SA) across compression levels. We evaluate 9 frontier LLMs across 8 concepts using 5 compression levels from extreme (c=0.0, ~2 words) to none (c=1.0, ~135 words). A three-judge LLM jury achieves almost perfect inter-rater agreement on CC (Fleiss' κ=0.90).
We observe a universal U-curve pattern in constraint compliance (97.2% prevalence), with violations peaking at medium compression (c=0.5, ~27 words). Counterintuitively, models perform better at extreme compression than medium lengths. The dimensions are statistically orthogonal (r=0.193, p=0.084), with constraint effects 2.9× larger than semantic effects.
Experimental validation via RLHF ablation confirms our constraint salience hypothesis: removing "helpfulness" signals improves CC by 598% on average (71/72 trials, p<0.001), with 79% achieving perfect compliance. This demonstrates that RLHF-trained helpfulness behaviors are the dominant cause of constraint violations at medium compression. Reasoning models outperform efficient models by 27.5% (Cohen's d=0.96).
Our findings reveal a fundamental tension between RLHF alignment and instruction-following, providing actionable guidelines for improving deployed systems.
Submission Number: 21
Loading