Bridging the Semantic Gap Between Text and Table: A Case Study on NL2SQL

Lin Long; Xijun Gu; Xinjie Sun; Wentao Ye; Haobo Wang; Sai Wu; Gang Chen; Junbo Zhao

Bridging the Semantic Gap Between Text and Table: A Case Study on NL2SQL

Lin Long, Xijun Gu, Xinjie Sun, Wentao Ye, Haobo Wang, Sai Wu, Gang Chen, Junbo Zhao

Published: 22 Jan 2025, Last Modified: 02 Mar 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Model, Table Representation Learning, NL2SQL, Multimodal Learning

TL;DR: To bridge the gap between tabular and textual information, we propose TNT, a table-language model that empowers LLMs with the ability to effectively and efficiently extract structure-enriched semantics from tabular data.

Abstract: The rise of Large Language Models (LLMs) has revolutionized numerous domains, yet these models still exhibit weakness in understanding structured tabular data. Although the growing context window promises to accommodate a larger volume of table contents, it does not inherently improve the model's ability to understand the underlying structure and semantics of tabular data. To bridge the semantic gap between **T**ext and **T**able, we propose **T**n**T**, a table-language model that features multimodal table representations to empower LLMs to effectively and efficiently abstract structure-enriched semantics from tabular data. **T**n**T** also introduces a scalable and efficient training pipeline, featuring novel self-supervised tasks, to integrate abstract tabular knowledge into the language modality. Extensive experimental results on NL2SQL demonstrate a much better table understanding of **T**n**T**, which achieves up to **14.4** higher execution accuracy compared with traditional text-based table representations.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7156

Loading