Keywords: automatic table verification, large language model, ai4science
TL;DR: We propose a new task of and a benchmark for automatic table verification across multiple tables using a large language model.
Abstract: Without accurate transcription of numerical data in scientific documents, a scientist
cannot draw accurate conclusions. Unfortunately, the process of copying numerical
data from one paper to another is prone to human error. In this paper, we propose to
meet this challenge through the novel task of automatic table verification (AutoTV),
in which the objective is to verify the accuracy of numerical data in tables by
cross-referencing cited sources. To support this task, we propose a new benchmark,
arXiVeri, which comprises tabular data drawn from open-access academic papers
on arXiv. We introduce metrics to evaluate the performance of a table verifier in
two key areas: (i) table matching, which aims to identify the source table in a cited
document that corresponds to a target table, and (ii) cell matching, which aims to
locate shared cells between a target and source table and identify their row and
column indices accurately. By leveraging the flexible capabilities of modern large
language models (LLMs), we propose simple baselines for table verification. Our
findings highlight the complexity of this task, even for state-of-the-art LLMs like
OpenAI’s GPT-4. The code and benchmark is made publicly available.
Submission Track: Attention
Submission Number: 21
Loading