Reflection Trigger: Latent Self-Correction for Question Answering by Steering Vector Injection

Reflection Trigger: Latent Self-Correction for Question Answering by Steering Vector Injection

ICLR 2026 Conference Submission16913 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Reflective Reasoning, Steering Vector, Activation Injection, Biomedical Reasoning, Commonsense Reasoning

Abstract: Large language models (LLMs) excel at reasoning tasks, but achieving stable reflective reasoning remains a challenge. Existing techniques, such as prompt engineering and multi-turn prompting, often lead to over-reflection, unstable outputs, and heavy reliance on manually designed prompts. In response to these limitations, we propose Reflection Trigger, a novel vector-based mechanism that dynamically injects the reflection vector into LLMs during inference without modifying model parameters. These vectors, based on latent semantic representations, are trained to encode reflection signals. By training a module to generate input-specific reflection vectors, our method provides a controllable and stable mechanism to adjust the model’s internal reflection tendencies. Experiments on biomedical and commonsense benchmarks demonstrate that the Reflection Trigger improves reasoning accuracy and reduces over-reflection. These results suggest that the Reflection Trigger enhances the stability of LLM reasoning and show that reflective reasoning can be treated as a learnable and controllable capability.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 16913

Loading