DataGovBench: Benchmarking LLM Agents for Real-World Data Governance Workflows

DataGovBench: Benchmarking LLM Agents for Real-World Data Governance Workflows

ACL ARR 2026 January Submission3810 Authors

04 Jan 2026 (modified: 07 Jun 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Data Governance, LLM Agents, Benchmark

Abstract: Data governance ensures data quality, security, and compliance through policies and standards—a critical foundation for scaling modern AI development. Recently, Large Language Models (LLMs) have emerged as a promising solution for automating data governance by translating user intent into executable transformation code. However, existing benchmarks for automated data science often emphasize snippet-level coding or high-level analytics, failing to capture the unique challenge of data governance: ensuring the correctness and quality of the data itself. To bridge this gap, we introduce DataGovBench, a benchmark featuring 150 diverse tasks grounded in real-world scenarios, built on data from actual cases. DataGovBench employs a novel ``reversed-objective'' methodology to synthesize realistic noise and utilizes rigorous metrics to assess end-to-end pipeline reliability. Our analysis on DataGovBench reveals that current models struggle with complex, multi-step workflows and lack robust error-correction mechanisms. Consequently, we propose DataGovAgent, a framework utilizing a Planner-Executor-Evaluator architecture that integrates constraint-based planning, retrieval-augmented generation, and sandboxed feedback-driven debugging. Experimental results show that DataGovAgent significantly boosts the Average Task Score (ATS) on complex tasks from 39.7 to 54.9 and reduces debugging iterations by over 77.9\% compared to general-purpose baselines.

Paper Type: Long

Research Area: AI/LLM Agents

Research Area Keywords: evaluation methodologies, code models, applications, LLM/AI agents

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: English

Submission Number: 3810

Loading