AgenticSimLaw: A Juvenile Courtroom Multi-Agent Debate Simulation for Explainable High-Stakes Tabular Decision Making

Published: 10 Jan 2026, Last Modified: 10 Jan 2026LaMAS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, Multi-Agent Debate (MAD), Multi-Agent Systems (MAS), AI Reasoning, Courtroom Simulations, Juvenile Recidivism, eXplainable AI (XAI)
TL;DR: A role-structured multi-agent debate framework for transparent LLM reasoning on tabular data that improves prediction stability and provides full auditability through courtroom-style interactions, benchmarked on high-stakes recidivism prediction.
Abstract: We introduce AgenticSimLaw, a role-structured, multi-agent debate framework that provides transparent and controllable test-time reasoning for high-stakes tabular decision-making tasks. Unlike black-box approaches, our courtroom-style orchestration explicitly defines agent roles (prosecutor, defense, judge), interaction protocols (7-turn structured debate), and private reasoning strategies, creating a fully auditable decision-making process. We benchmark this framework on young adult recidivism prediction using the NLSY97 dataset, comparing it against traditional chain-of-thought (CoT) prompting across almost 90 unique combinations of models and strategies. Our results demonstrate that structured multi-agent debate provides more stable and generalizable performance compared to single-agent reasoning, with stronger correlation between accuracy and F1-score metrics. Beyond performance improvements, AgenticSimLaw offers fine-grained control over reasoning steps, generates complete interaction transcripts for explainability, and enables systematic profiling of agent behaviors. While we instantiate this framework in the criminal justice domain to stress-test reasoning under ethical complexity, the approach generalizes to any deliberative, high-stakes decision task requiring transparency and human oversight. This work addresses key LLM-based multi-agent system challenges: organization through structured roles, observability through logged interactions, and responsibility through explicit non-deployment constraints for sensitive domains. Data, results, and code will be available on github.com under the MIT license.
Submission Number: 36
Loading