Beyond Row-Level Prediction: A Unified Evaluation of Table Representation Methods and Recoverable Table-Level Geometry

02 May 2026 (modified: 03 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Tables exhibit multiple interacting levels of structure, and useful table representations must compose fine-grained local signals into reusable whole-table embeddings. Yet table representation methods are still often assessed through row-level prediction or downstream supervised tasks rather than through the quality of the table-level representations they produce. We introduce a unified evaluation framework for table representation methods built around four practical desiderata: consistency under partial views, discriminability across label granularities, robustness to benign perturbations, and efficiency. Across controlled synthetic families and real open-source corpora, we find a consistent pattern: lightweight schema- and text-based methods often outperform naive mean-pooled embeddings from state-of-the-art tabular foundation models on the practical quality-cost frontier. This suggests that table-level representations are not an automatic byproduct of predictive training, but depend critically on how local tabular signals are composed into a global representation. To test this hypothesis, we freeze the encoder of a tabular foundation model and train lightweight representation heads on top of its outputs. The learned heads substantially improve table-level geometry over naive pooling, showing that useful compositional structure is recoverable from the same encoder states, although bounded by the information ceiling of the frozen backbone. A closed-corpus parent-table retrieval proof of concept mirrors the benchmark trends and again shows simple methods outperforming pooled tabular foundation models. Together, these results position table-level representation as a first-class problem beyond row-level prediction and highlight learned composition as a key ingredient in reusable representations.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Chinmay_Hegde1
Submission Number: 8726
Loading