Keywords: Large Language Models (LLMs), Contextual Hallucination, Attention Modulation, PID Control Feedback Loop, Factual Accuracy, Real-Time Intervention, Cross-Model Evaluation, Cross-Dataset Generalization, LLaMA, Mistral, Qwen
TL;DR: We steer LLMs away from hallucinations by dynamically amplifying context-sensitive attention heads.
Abstract: Large language models (LLMs) increasingly operate as autonomous agents—reasoning, planning, and interacting with humans and external tools. Yet these agentic systems often exhibit ungrounded or inconsistent behavior, undermining trust and alignment. We introduce COMPASS (Context-Modulated PID Attention Steering System), a lightweight and interpretable control framework for enhancing reliability in agentic LLMs. COMPASS embeds a model-based feedback loop directly within decoding, using a transparent metric, the Context Reliance Score (CRS), to quantify how attention heads ground decisions in contextual evidence. A PID controller dynamically adjusts internal attention distributions in real time, steering the model toward factually and semantically consistent reasoning without retraining or multi-pass inference. Across reasoning and retrieval-augmented benchmarks (HotpotQA, XSum, HaluEval, RAGTruth), COMPASS reduces hallucination rates (2.8–5.8\% absolute) while exposing interpretable control signals that reveal which attention heads drive trustworthy behavior. By coupling interpretability with control-theoretic feedback, COMPASS provides a pathway for verifiable, steerable, and value-aligned agentic LLMs, advancing the goals of trustworthy autonomous AI.
Submission Number: 60
Loading