Diagnose, Localize, Align: A Full-Stack Framework for Reliable LLM Multi-Agent Systems under Instruction Conflicts

Guancheng Wan; Leixin Sun; Mengting Li

Diagnose, Localize, Align: A Full-Stack Framework for Reliable LLM Multi-Agent Systems under Instruction Conflicts

Guancheng Wan, Leixin Sun, Mengting Li

Published: 08 Nov 2025, Last Modified: 08 Nov 2025NeurIPS 2025 Workshop NORA OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Agent

Abstract: Large Language Model (LLM)-powered multi-agent systems (MAS) have rapidly advanced collaborative reasoning, tool use, and specialized-role coordination in complex tasks. However, reliable deployment is hindered by a systemic failure mode: hierarchical compliance under instruction conflicts (system↔user, peer↔peer), where agents may misprioritize system-level rules in the presence of competing demands. Widely used macro-level metrics (e.g., pass@k) obscure these micro-level violations and offer little actionable guidance for remedy. This work presents a full-stack, three-stage framework: (1) Diagnose — Contextualized Role Adherence Score (CRAS), a query-wise, context-aware metric decomposing role adherence into four measurable dimensions; (2) Localize — attention drift analysis revealing that instruction conflicts are resolved by attention heads largely concentrated in middle layers; (3) Align — Surgical Alignment of Instruction Layers (SAIL), which installs low-rank adapters only on the localized focal layers and optimizes a token-weighted DPO-style preference objective that credits tokens by their focal attentional contribution. Across standard benchmarks and MAS frameworks, our surgical approach improves instruction hierarchy compliance (e.g., +5.60% with AutoGen on MedQA) without full-model fine-tuning.

Submission Number: 7

Loading