Non-Reasoning transfers to Reasoning: Agentic Prompt Injection Defense

Non-Reasoning transfers to Reasoning: Agentic Prompt Injection Defense

ACL ARR 2026 January Submission4611 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: robustness, defense, security, injection, reasoning, secure, training, alignment, agent

Abstract: Prompt injections are a critical issue limiting adoption of LLMs for interacting with insecure data. This particularly limits the ability of agents interacting with the outside world. We combat this limitation by introducing Reasoning SecAlign, a training approach specifically targeted at training robustness into reasoning LLMs. By leveraging the connection between reasoning and non-reasoning mode, we are able to harden reasoning LLMs by training on their non-reasoning distribution. Training based interventions incur no inference time overhead compared to test time scaling and have efficiency and flexibility improvements over system based methods. We maintain benchmark utility across a wide range of evaluations, and reduce indirect prompt injection attack success rates to 0 or near 0.

Paper Type: Short

Research Area: Safety and Alignment in LLMs

Research Area Keywords: Language Modeling,NLP Applications,

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 4611

Loading