Abstract: Advancements in language understanding by Language Models (LMs) have enabled reasoning over tabular data, primarily through training mechanisms that support direct table modification. However, these approaches are often limited to small tables that fit within the model’s context window, raising concerns about the scalability of tabular reasoning. To address this challenge, we propose TabularLens, a Retrieval-Augmented Generation (RAG) framework designed to retrieve and structure interpretable table content that can be scaled across multiple tables with different schemas, for LM-based applications. TabularLens employs a two-stage filtering process and a row-column retrieval strategy to efficiently index and extract relevant table elements before passing them to the LM, significantly reducing the input size and enhancing code generation precision. Furthermore, unlike existing models that struggle with proper nouns—such as named entities or domain-specific identifiers—which often lack meaningful embeddings, TabularLens introduces a dedicated mechanism to recognize and appropriately handle such tokens. This ensures robust retrieval and reasoning even when dealing with semantically sparse or opaque table entries.
Paper Type: Long
Research Area: Information Retrieval and Text Mining
Research Area Keywords: Table RAG, reasoning, semantic parsing
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Keywords: Table RAG, reasoning, semantic parsing
Submission Number: 1402
Loading