GRI-QA: a Comprehensive Benchmark for Table Question Answering over Environmental Data

GRI-QA: a Comprehensive Benchmark for Table Question Answering over Environmental Data

ACL ARR 2025 February Submission1458 Authors

13 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Assessing corporate environmental sustainability with Table Question Answering systems is challenging due to complex tables, specialized terminology, and the variety of questions they must handle. In this paper, we introduce GRI-QA, a benchmark designed to evaluate Table QA approaches in the environmental domain. Using GRI standards, we extract and annotate tables from non-financial corporate reports, generating question-answer pairs through a hybrid LLM-human approach. The benchmark includes eight datasets, categorized by the types of operations required, including operations on multiple tables from multiple documents. Our evaluation reveals a significant gap between human and model performance, particularly in multi-step reasoning, highlighting the relevance of the benchmark and the need for further research in domain-specific Table QA. Code and benchmark datasets are available at https://anonymous.4open.science/r/gri_qa-EA6F/.

Paper Type: Long

Research Area: Question Answering

Research Area Keywords: multihop QA, generalization, reasoning, math QA, table QA, question generation

Contribution Types: Data resources, Data analysis

Languages Studied: English

Submission Number: 1458

Loading