Graph-Augmented Tabular Transformers: The Simplicity Advantage

Franck Le; Keith Grueneberg; Vadim Sheinin

Graph-Augmented Tabular Transformers: The Simplicity Advantage

Franck Le, Keith Grueneberg, Vadim Sheinin

18 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Tabular data, Graph Neural Networks, Transformer-Graph Models

TL;DR: Augmenting tabular transformers with graph structures improves representation learning and predictive performance.

Abstract: Transformer models have recently advanced tabular prediction, but they usually treat rows as independent, ignoring that similar instances often share outcomes. Graph augmentation introduces an explicit inductive bias by connecting instances or features and refining embeddings with Graph Neural Networks (GNNs). We present TANGO (Transformers Augmented with Graphs for Tabular Predictions), a large-scale systematic study (to our knowledge, the largest to date) of graph-augmented tabular transformers across 193 datasets (117 classification, 76 regression). Across this benchmark, TANGO not only improves a strong transformer backbone but also surpasses state-of-the-art tabular foundation models (TabPFNv2, TabICL, MITRA) and consistently outperforms classical tree ensembles (CatBoost, XGBoost) in both classification and regression, achieving the most rank-1 wins, lowest average rank, and smallest relative error gaps. Our analysis yields three insights. (1) Graph augmentation consistently improves a strong transformer backbone across diverse tasks. (2) Static graphs outperform dynamic ones, offering better stability and generalization. (3) Within static graphs, frozen embeddings are overall more reliable, consistently outperforming other variants in regression and classification. These results overturn the assumption that dynamic graphs or joint training are always superior, showing instead that schema-anchored graph priors drive generalization: static graphs enforce a stable relational structure, while dynamic ones often introduce instability and risk overfitting.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 11619

Loading