Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers

Published: 20 May 2026, Last Modified: 20 May 2026DMP 2026 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: prompt injection, LLM security, agent safety, multi-agent systems, red-teaming, kill-chain analysis, indirect prompt injection
TL;DR: Stage-level canary tracking across 950 runs and 5 frontier LLMs reframes prompt injection as a pipeline-architecture problem: every model is fully exposed, and safety is determined by write-node placement, not model identity.
Abstract: Multi-agent LLM systems are entering production—processing documents, managing workflows, acting on behalf of users—yet their resilience to prompt injection is still evaluated with a single binary: did the attack succeed? This leaves architects without the diagnostic information needed to harden real pipelines. We introduce a kill-chain canary methodology that tracks a cryptographic token through four stages (Exposed → Persisted → Relayed → Executed) across 950 runs, five frontier LLMs, six attack surfaces, and five defense conditions. The results reframe prompt injection as a pipeline-architecture problem: every model is fully exposed, yet outcomes diverge downstream—Claude blocks all injections at memory-write (0/164 ASR), GPT-4o-mini propagates at 53%, and DeepSeek exhibits 0%/100% across surfaces from the same model. Three findings matter for deployment: (1) write-node placement is the highest-leverage safety decision—routing writes through a verified model eliminates propagation; (2) all four defenses fail on at least one surface due to channel mismatch alone, no adversarial adaptation required; (3) invisible whitefont PDF payloads match or exceed visible-text ASR, meaning rendered-layer screening is insufficient. These dynamics apply directly to production: institutional investors and financial firms already run NLP pipelines over earnings calls, SEC filings, and analyst reports—the document-ingestion workflows now migrating to LLM agents. Code, run logs, and tooling are publicly released (link withheld for anonymous review).
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 6
Loading