QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems

Yiliu Yang; Yilei Jiang; Qunzhong Wang; Yingshui Tan; Xiaoyong Zhu; Bo Zheng; Sherman S. M. Chow; Xiangyu Yue

QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems

Yiliu Yang, Yilei Jiang, Qunzhong Wang, Yingshui Tan, Xiaoyong Zhu, Bo Zheng, Sherman S. M. Chow, Xiangyu Yue

05 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI Safety, Multi-agent System

TL;DR: We propose a Multi-agent System based method to safeguard other agents and systems.

Abstract: Safety risks arise as large language model-based agents solve complex tasks with tools, multi-step plans, and inter-agent messages. However, deployer-written policies in natural language are ambiguous and context dependent, so they map poorly to machine-checkable rules and runtime enforcement is unreliable. Expressing safety policies as sequents, we propose QuadSentinel, a four-agent guard (state tracker, policy verifier, threat watcher, and referee) that compiles these policies into machine-checkable rules built from predicates over observable state and enforces them online. Referee logic plus an efficient top-$k$ predicate updater keeps costs low by prioritizing checks and resolving conflicts hierarchically. Measured on ST-WebAgentBench (ICML CUA '25) and AgentHarm (ICLR '25), QuadSentinel improves guardrail accuracy and rule recall while reducing false positives. Against single-agent baselines such as ShieldAgent (ICML '25), it yields better overall safety control. Near-term deployments can adopt this pattern without modifying core agents by keeping policies separate and machine-checkable.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 2418

Loading