Bridging the Semantic Gap Between Text and Table: A Case Study on NL2SQL

Published: 22 Jan 2025, Last Modified: 02 Mar 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Model, Table Representation Learning, NL2SQL, Multimodal Learning
TL;DR: To bridge the gap between tabular and textual information, we propose TNT, a table-language model that empowers LLMs with the ability to effectively and efficiently extract structure-enriched semantics from tabular data.
Abstract: The rise of Large Language Models (LLMs) has revolutionized numerous domains, yet these models still exhibit weakness in understanding structured tabular data. Although the growing context window promises to accommodate a larger volume of table contents, it does not inherently improve the model's ability to understand the underlying structure and semantics of tabular data. To bridge the semantic gap between **T**ext and **T**able, we propose **T**n**T**, a table-language model that features multimodal table representations to empower LLMs to effectively and efficiently abstract structure-enriched semantics from tabular data. **T**n**T** also introduces a scalable and efficient training pipeline, featuring novel self-supervised tasks, to integrate abstract tabular knowledge into the language modality. Extensive experimental results on NL2SQL demonstrate a much better table understanding of **T**n**T**, which achieves up to **14.4** higher execution accuracy compared with traditional text-based table representations.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7156
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview