TabX: X-cellent at Complex Tables and Beyond

18 Sept 2025 (modified: 27 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Table Understanding, Table MLLM
Abstract: Recent advances in table understanding have shifted from text-based large language model (LLM) methods to multimodal LLM (MLLM) methods like Table-LLaVA that directly process table images. Despite these advances, existing table MLLMs still exhibit limited robustness to complex table layouts and poor generalization to unseen tasks. We trace these failings to two fundamental issues in their development pipeline: (1) a low-quality dataset composed of instruction-table-answer triplets and (2) a lack of all-around understanding of table images. This predicament is analogous to a student learning from flawed material with no mechanism for self-correction. Typically, true understanding is not attained through passive study alone, but rather through iterative self-evaluation and the correction of errors under teacher guidance. Inspired by this cognitive process, we first curate a new dataset, MMTab-Pro, by introducing three challenging tuning tasks that encourage the model to perform a deeper understanding of table content and structure, while applying a reflection-based enhancement to refine low-quality triplets. We further propose a Self-Evolution with Teacher-Tuning (SETT) framework to fine-tune the model, which enables the model to evolve through self-feedback and the guidance of a stronger teacher model, continuously refining both data suitability and model comprehension. Finally, through the two-step pipeline developed above, we present TabX, a robust and generalizable table MLLM. Experiments on the MMTab-eval benchmark show that TabX outperforms existing models, particularly on structurally complex and unseen tasks.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 11892
Loading