Tabular Data in Interactive and Conversational AI: A Survey of Foundations, Benchmarks, Systems, and Open Problems

Tabular Data in Interactive and Conversational AI: A Survey of Foundations, Benchmarks, Systems, and Open Problems

15 Apr 2026 (modified: 11 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Tabular and structured data underlie much of modern analytical work, yet natural language systems for interacting with such data have largely been studied in fragmented subfields. This survey studies that landscape under the broader problem of \emph{conversational AI over tabular and structured data}: systems that support multi-turn, context-dependent interaction with tables, databases, spreadsheets, and hybrid table--text documents. We first clarify the problem setting by defining tabular data, conversational interaction, and the primary interaction modes that distinguish querying, translating, manipulating, and orchestrating over structured data, while treating exploration as a recurrent interaction pattern rather than a separate category. Using an explicit corpus-construction and evidence policy, we organize 106 unique cited works into five categories: \emph{Foundations}, \emph{Conversational Table Question Answering} (CTabQA), \emph{Conversational Text-to-SQL} (CText2SQL), \emph{Interactive Table Manipulation}, and \emph{Agentic Table Systems}. Across these categories, we compare benchmark datasets, modelling paradigms, and evaluation practices, while tracing how closely related problems have often been studied under different task names, benchmarks, and research communities. Our synthesis shows recurring fragmentation in terminology, benchmark conventions, and modelling assumptions across the surveyed literatures, but we treat that fragmentation as a qualitative finding of the review rather than as a formal bibliometric result. We also find that CText2SQL currently has the most standardized benchmark and modelling pipeline, whereas manipulation and agentic systems more closely reflect real user workflows but remain harder to evaluate rigorously. Beyond category-specific findings, we identify three cross-cutting themes shared across the field: intent disambiguation and clarification, dialogue context tracking, and evaluation. These reveal a central mismatch between current benchmarks and realistic use: most systems are still optimized for short, clean, single-table interactions rather than long-horizon, ambiguous, multi-source analytical workflows. We conclude by synthesizing the field's main open problems, including unified evaluation, long-dialogue robustness, proactive clarification, interpretability, privacy, domain adaptation, and multi-table reasoning, and argue that progress will depend on moving from narrow task benchmarks toward integrated, user-centered conversational data systems.

Submission Type: Long submission (more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=g0442rVmpH&noteId=g0442rVmpH

Changes Since Last Submission: The previous submission was desk-rejected due to formatting non-compliance. In this resubmission, the manuscript has been fully revised to adhere to the official Transactions on Machine Learning Research template. All font settings, spacing, margins, and overall layout have been corrected to match the required specifications. No changes have been made to the technical content, experiments, results, or conclusions of the paper.

Assigned Action Editor: ~Hsuan-Tien_Lin1

Submission Number: 8443

Loading