Large Language Models (LLMs) on Tabular Data: Prediction, Generation, and Understanding - A Survey

Published: 23 Jun 2024, Last Modified: 23 Jun 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Recent breakthroughs in large language modeling have facilitated rigorous exploration of their application in diverse tasks related to tabular data modeling, such as prediction, tabular data synthesis, question answering, and table understanding. Each task presents unique challenges and opportunities. However, there is currently a lack of comprehensive review that summarizes and compares the key techniques, metrics, datasets, models, and optimization approaches in this research domain. This survey aims to address this gap by consolidating recent progress in these areas, offering a thorough survey and taxonomy of the datasets, metrics, and methodologies utilized. It identifies strengths, limitations, unexplored territories, and gaps in the existing literature, while providing some insights for future research directions in this vital and rapidly evolving field. It also provides relevant code and datasets references. Through this comprehensive review, we hope to provide interested readers with pertinent references and insightful perspectives, empowering them with the necessary tools and knowledge to effectively navigate and address the prevailing challenges in the field.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Revisions to incorporate feedbacks from the reviewers.
Assigned Action Editor: ~Greg_Durrett1
Submission Number: 2273