A Relation Extraction Dataset for Knowledge Extraction from Web Tables

Siffi Singh, Alham Fikri Aji, Gaurav Singh, Christos Christodoulopoulos

Published: 2022, Last Modified: 21 May 2025COLING 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Relational web-tables are significant sources of structural information that are widely used for relation extraction and population of facts into knowledge graphs. To transform the web-table data into knowledge, we need to identify the relations that exist between column pairs. Currently, there are only a handful of publicly available datasets with relations annotated against natural web-tables. Most datasets are constructed using synthetic tables that lack valuable metadata information, or are limited in size to be considered as a challenging evaluation set. In this paper, we present REDTab, the largest natural-table relation extraction dataset. We have annotated ~9K tables and ~22K column pairs using crowd sourced annotators from MTurk, which has 50x larger number of column pairs than the existing human-annotated benchmark. Our test set is specially designed to be challenging as observed in our experiment results using TaBERT. We publicly release REDTab as a benchmark for the evaluation process in relation extraction.