TabularLens: Leveraging Indexed Retrieval for Table Understanding

TabularLens: Leveraging Indexed Retrieval for Table Understanding

ACL ARR 2025 May Submission1402 Authors

17 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Advancements in language understanding by Language Models (LMs) have enabled reasoning over tabular data, primarily through training mechanisms that support direct table modification. However, these approaches are often limited to small tables that fit within the model’s context window, raising concerns about the scalability of tabular reasoning. To address this challenge, we propose TabularLens, a Retrieval-Augmented Generation (RAG) framework designed to retrieve and structure interpretable table content that can be scaled across multiple tables with different schemas, for LM-based applications. TabularLens employs a two-stage filtering process and a row-column retrieval strategy to efficiently index and extract relevant table elements before passing them to the LM, significantly reducing the input size and enhancing code generation precision. Furthermore, unlike existing models that struggle with proper nouns—such as named entities or domain-specific identifiers—which often lack meaningful embeddings, TabularLens introduces a dedicated mechanism to recognize and appropriately handle such tokens. This ensures robust retrieval and reasoning even when dealing with semantically sparse or opaque table entries.

Paper Type: Long

Research Area: Information Retrieval and Text Mining

Research Area Keywords: Table RAG, reasoning, semantic parsing

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: English

Keywords: Table RAG, reasoning, semantic parsing

Submission Number: 1402

Loading