OpenDataBench: Real-World Benchmark for Table Insight Generation and Question Answering Over Open Data
Keywords: Data Analytics Benchmark, LLM Agent, LLM Evaluation, Table Question and Answering, Table Insight Generation, Open Data
TL;DR: A challenging benchmark for table insight generation and question answering over open data accompanied by proposed specific LLM agents
Abstract: The promise of Large Language Models (LLMs) for data analysis is hindered by benchmarks that inadequately reflect real-world complexities, including multiple large tables and external knowledge. Moreover, they mainly focus on fact retrieval via Question Answering (QA) and overlook the critical task of exploratory insight generation. To address these gaps, we introduce OpenDataBench, a benchmark built from governmental open data capturing these practical challenges. It features two types of tasks: multifaceted Table QA tasks that require answering complex decomposable questions with either text or graphs, and Table Insight tasks that challenge models to generate expert-level findings from exploratory data analysis.
We evaluate state-of-the-art LLMs and our proposed agentic solution on OpenDataBench. Our experimental results indicate that even top-performing models struggle with both tasks. This highlights a significant gap between current model capabilities and the demands of realistic data analysis. OpenDataBench serves as a rigorous benchmark for advancing research on LLM-driven data analysis systems capable of addressing both reactive question answering and proactive insight discovery. Code and sample data are available at https://anonymous.4open.science/r/opendatabench-8AFA/
Primary Area: datasets and benchmarks
Submission Number: 15035
Loading