COMPASS: Context-Modulated PID Attention Steering System for Hallucination Mitigation

AAAI 2026 Workshop TrustAgent Submission60 Authors

Published: 20 Nov 2025, Last Modified: 09 Mar 2026AAAI 2026 TrustAgent Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models (LLMs), Contextual Hallucination, Attention Modulation, PID Control Feedback Loop, Factual Accuracy, Real-Time Intervention, Cross-Model Evaluation, Cross-Dataset Generalization, LLaMA, Mistral, Qwen
TL;DR: We steer LLMs away from hallucinations by dynamically amplifying context-sensitive attention heads.
Abstract: Large language models (LLMs) increasingly operate as autonomous agents—reasoning, planning, and interacting with humans and external tools. Yet these agentic systems often exhibit ungrounded or inconsistent behavior, undermining trust and alignment. We introduce COMPASS (Context-Modulated PID Attention Steering System), a lightweight and interpretable control framework for enhancing reliability in agentic LLMs. COMPASS embeds a model-based feedback loop directly within decoding, using a transparent metric, the Context Reliance Score (CRS), to quantify how attention heads ground decisions in contextual evidence. A PID controller dynamically adjusts internal attention distributions in real time, steering the model toward factually and semantically consistent reasoning without retraining or multi-pass inference. Across reasoning and retrieval-augmented benchmarks (HotpotQA, XSum, HaluEval, RAGTruth), COMPASS reduces hallucination rates (2.8–5.8\% absolute) while exposing interpretable control signals that reveal which attention heads drive trustworthy behavior. By coupling interpretability with control-theoretic feedback, COMPASS provides a pathway for verifiable, steerable, and value-aligned agentic LLMs, advancing the goals of trustworthy autonomous AI.
Submission Number: 60
Loading