ConTextTab: A Semantics-Aware Tabular In-Context Learner

Published: 09 Jun 2025, Last Modified: 09 Jun 2025FMSD @ ICML 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: tabular learning, semantics, in-context learning, deep learning
TL;DR: A state-of-the-art semantics-enriched TabPFNv2-style in-context learner trained on real-world tabular data
Abstract: Tabular in-context learning (ICL) models such as TabPFN and TabICL have recently achieved state-of-the-art (SOTA) performance on several tabular prediction tasks. Trained exclusively on synthetic data, these models however do not fully leverage the rich semantics and world knowledge contained in real-world data. Tabular ICL models based on pretrained large language models such as TabuLa-8B integrate semantics and world knowledge but are only able to make use of a small amount of context due to inherent architectural limitations. Aiming to bridge this gap, we introduce ConTextTab, integrating semantic understanding and alignment into a table-native ICL framework. Using specialized embeddings for different data modalities and training on large-scale real-world tabular data, our model is competitive with SOTA across a broad set of benchmarks while setting a new standard on the semantically rich CARTE benchmark.
Submission Number: 34
Loading