Instruction Bleed: A Theory-Anchored Benchmark for Cross-Module Interference in Prompt-Composed Agents

Published: 25 May 2026, Last Modified: 25 May 2026CTB@ICML 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: theory-anchored benchmarks, architectural inductive biases, compositional generalization, calibrated empirical evaluation, agent reliability
TL;DR: We formalize cross-module behavioral interference (CBL) in prompt-composed agents, propose a 3-channel benchmark, and show semantically irrelevant edits shift unrelated scores — isolating coverage-bounded composition as the mechanism
Abstract: Transformer self-attention computes global pairwise interactions across its input, leaving no architectural isolation between concatenated prompt modules. Three architectural inductive biases — proactive interference, coverage-bounded compositional generalization, and format sensitivity — jointly predict cross-module behavioral interference not derivable from per-module testing, yet no current agent benchmark measures it. We contribute a theory-anchored benchmark protocol whose three perturbation channels (volume, content, form) each isolate one of the predicted mechanisms, with paired effect sizes and bootstrap CIs as the calibrated readout. On a deployed job-evaluation agent (Claude Sonnet 4.6, 144 trials), only the content channel produces a detectable effect (Cohen's d = 0.63, bootstrap 95% CI [+0.03, +0.31], excluding zero); volume and form CIs include zero, discriminatively localizing interference to coverage-bounded composition. We formalize compositional behavioral leakage (CBL) and derive falsifiable predictions framing the multi-system replication program.
Paper Type: Tiny (2 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 153
Loading