What Really Matters for Table LLMs? A Meta-Evaluation of Model and Data Effects

ACL ARR 2025 May Submission1553 Authors

17 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In this work, we revisit this trajectory and highlight emerging challenges in the LLM era, particularly the paradox of choice: the difficulty of attributing performance gains amid diverse base models and training sets. We replicate four table LLMs by instruction-tuning three foundation models on four existing datasets, yielding 12 models. We then evaluate these models across 16 tables benchmarks. Our analysis reveals that while training data plays a role, base model selection is important, and in many cases, dominates performance. Generalization and reasoning remain challenging, inviting future effort on table modeling. Based on our findings, we share our thoughts on the future directions for table modeling.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: table instruction tuning, table LLMs, generalization, replication, OOD evaluation
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study
Languages Studied: English
Submission Number: 1553
Loading