Keywords: retrieval-augmented generation, structured data, relational tables, text-to-sql, question answering, fact verification
TL;DR: We introduce TARGET: a benchmark for evaluating TAble Retrieval for GEnerative Tasks, to better contextualize interactions with LLMs in structured data through retrieval-augmented generation.
Abstract: The data landscape is rich with structured data, often of high value to organizations, that drive important applications in data analysis and machine learning. Recent progress in representation learning and generative models for such data has led to the development of natural language interfaces to structured data, including those leveraging text-to-SQL. Contextualizing interactions, including conversational and agentic elements, in structured data through retrieval-augmented generation can provide substantial benefits in the form of freshness, accuracy, and comprehensiveness of answers. The key question, however, is: how do we retrieve the right table(s) for the analytical query or task at hand? To investigate this question, we introduce TARGET: a benchmark for evaluating TAble Retrieval for GEnerative Tasks. We use target to analyze the retrieval performance of different retrievers in isolation, as well as their impact on downstream generators for question answering, fact verification, and text-to-SQL. We find that out-of-the-box embedding-based retrievers far outperform a BM25 baseline which appears less effective than it is for retrieval over unstructured text. We also surface the sensitivity of retrievers across various metadata (e.g., missing table titles), and illustrate a stark variation of retrieval performance across datasets and tasks. TARGET is developed for easy reuse and extension to advance research on retrieval methods and pipelines for relational data through fine-grained, comprehensive, and consistent evaluation. TARGET is available at https://target-benchmark.github.io.
Submission Number: 57
Loading