AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry

Yannis Katsis; Saneem Ahmed Chemmengath; vishwajeet kumar; samarth bharadwaj; Mustafa Canim; Michael Glass; Alfio Gliozzo; Feifei Pan; Jaydeep Sen; Karthik Sankaranarayanan; Soumen Chakrabarti

AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry

Yannis Katsis, Saneem Ahmed Chemmengath, vishwajeet kumar, samarth bharadwaj, Mustafa Canim, Michael Glass, Alfio Gliozzo, Feifei Pan, Jaydeep Sen, Karthik Sankaranarayanan, Soumen Chakrabarti

08 Jun 2021 (modified: 26 May 2025)Submitted to NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone

Keywords: tableQA, zero-shot, question-answering, dataset, benchmark, complex-tables

TL;DR: a real-world table QA dataset of 515 questions and on 116 tables extracted from public U.S. SEC filings of major airline companies for the fiscal years 2017-2019. The tables have a complex layout, numerical cells and domain specific terms.

Abstract: Recent advances in transformers have enabled Table Question Answering (Table QA) systems to achieve high accuracy and SOTA results on open domain datasets like WikiTableQuestions and WikiSQL. Such transformers are frequently pre-trained on open-domain content such as Wikipedia, where they effectively encode questions and corresponding tables from Wikipedia as seen in Table QA dataset. However, web tables in Wikipedia are notably flat in their layout, with the first row as the sole column header. The layout lends to a relational view of tables where each row is a tuple. Whereas, tables in domain-specific business or scientific documents often have a much more complex layout, including hierarchical row and column headers, in addition to having specialized vocabulary terms from that domain. To address this problem, we introduce the domain-specific Table QA dataset AITQA (Airline Industry Table QA). The dataset consists of 515 questions authored by human annotators on 116 tables extracted from public U.S. SEC filings (SEC Filings publicly available at: https://www.sec.gov/edgar.shtml) of major airline companies for the fiscal years 2017-2019. We also provide annotations pertaining to the nature of questions, marking those that require hierarchical headers, domain-specific terminology, and paraphrased forms. Our zero-shot baseline evaluation of three transformer-based SOTA Table QA methods - TaPAS (end-to-end), TaBERT (semantic parsing-based), and RCI (row-column encoding-based) - clearly exposes the limitation of these methods in this practical setting, with the best accuracy at just 51.8% (RCI). We also present pragmatic table pre-processing steps used to pivot and project these complex tables into a layout suitable for the SOTA Table QA models.

Supplementary Material: zip

URL: https://github.com/IBM/AITQA

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 5 code implementations](https://www.catalyzex.com/paper/ait-qa-question-answering-dataset-over/code)

10 Replies

Loading