MADRA: Multi-Agent Debate for Risk-Aware Embodied Planning

MADRA: Multi-Agent Debate for Risk-Aware Embodied Planning

ACL ARR 2026 January Submission1823 Authors

31 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Risk Assessment, LLM Safety, Multi-Agent Debate, Task Planning, Cognitive Collaboration

Abstract: Large Language Models (LLMs) exhibit impressive reasoning capabilities but often suffer from Embodied Semantic Hallucinations—generating plans that are semantically fluent but physically unsafe due to a lack of grounded common sense. Existing safety alignment methods, such as RLHF or naive safety prompting, typically fall into a Safety-Utility Trade-off, resulting in severe over-rejection of benign household instructions. To address this, we propose MADRA (Multi-Agent Deliberation for Risk Awareness), a training-free cognitive architecture that mimics System-2 deliberation. MADRA introduces a meta-cognitive Critical Agent that evaluates peer debates using a structured argumentation framework derived from the Toulmin Model, effectively mitigating the ``herd mentality'' in multi-agent systems. We also introduce SafeAware-VH, a benchmark featuring adversarial safe instructions designed to probe agents' sensitivity to physical risks. Extensive experiments demonstrate that MADRA breaks the Pareto frontier, achieving over 90\% rejection of unsafe tasks while maintaining high utility, significantly outperforming standard Chain-of-Thought and single-agent reflection baselines.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: Language Modeling

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources

Languages Studied: English

Submission Number: 1823

Loading