QATCH: Benchmarking SQL-centric tasks with Table Representation Learning Models on Your Data

Simone Papicchio; Paolo Papotti; Luca Cagliero

QATCH: Benchmarking SQL-centric tasks with Table Representation Learning Models on Your Data

Simone Papicchio, Paolo Papotti, Luca Cagliero

Published: 26 Sept 2023, Last Modified: 10 Jan 2024NeurIPS 2023 Datasets and Benchmarks PosterEveryoneRevisionsBibTeX

Keywords: Table Representation Learning, TRL, Benchmarking tool, Checklist, Question Answering, QA, Semantic Parsing, text-to-SQL, Proprietary Data, Tabular Data, Relational Tables, Cross-task performance metrics, Automated testing, Large Language Model, ChatGPT, LLM

TL;DR: QATCH is a toolbox for testing Table Representation Learning models on unseen data, automating checklists for Questions Answering and Semantic Parsing tasks, and evaluating performance with cross-task metrics.

Abstract: Table Representation Learning (TRL) models are commonly pre-trained on large open-domain datasets comprising millions of tables and then used to address downstream tasks. Choosing the right TRL model to use on proprietary data can be challenging, as the best results depend on the content domain, schema, and data quality. Our purpose is to support end-users in testing TRL models on proprietary data in two established SQL-centric tasks, i.e., Question Answering (QA) and Semantic Parsing (SP). We present QATCH (Query-Aided TRL Checklist), a toolbox to highlight TRL models’ strengths and weaknesses on relational tables unseen at training time. For an input table, QATCH automatically generates a testing checklist tailored to QA and SP. Checklist generation is driven by a SQL query engine that crafts tests of different complexity. This design facilitates inherent portability, allowing the checks to be used by alternative models. We also introduce a set of cross-task performance metrics evaluating the TRL model’s performance over its output. Finally, we show how QATCH automatically generates tests for proprietary datasets to evaluate various state-of-the-art models including TAPAS, TAPEX, and CHATGPT.

Submission Number: 810

Loading