A learning- driven multi agents approach to metadata extraction from legal contracts

Golnar Behzadi

A learning- driven multi agents approach to metadata extraction from legal contracts

Golnar Behzadi

16 Sept 2025 (modified: 30 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multi agents, legal contracts, metadata, reinforcement learning, orchestration policy optimisation

Abstract: High-quality metadata is essential for downstream legal retrieval applications, yet its generation remains challenging due to the need to balance accuracy, scalability, and computational cost. We present a reinforcement learning (RL)–based orchestration framework that coordinates specialized LLM agents to optimize the cost–quality trade-off in contract metadata extraction. The orchestrator is trained with a goal-conditioned reward balancing field-level extraction quality against token usage, and instantiated with two learners: a goal-conditioned Deep Q-Network (DQN) and Proximal Policy Optimization (PPO). We evaluate performance in two settings of varying complexity: (1) a uniform enterprise contract-summary dataset and (2) full-text CUAD contracts. On CUAD, a quality-first PPO policy improves average quality by about 2% over an agentic LLM with rigid orchestration, while an efficiency-first policy reduces token usage by 35% with only a modest quality trade-off. On enterprise summaries, DQN achieves 65% token savings at a 28% quality trade-off, making it a strong option under strict budget constraints. We further introduce a multi-pass variant that preserves baseline quality while substantially reducing cost. Overall, RL-based orchestration offers a practical path toward scalable and economically sustainable legal metadata generation.

Primary Area: reinforcement learning

Submission Number: 6448

Loading