Towards Benchmarking Foundation Models for Tabular Data With Text

Published: 09 Jun 2025, Last Modified: 09 Jun 2025FMSD @ ICML 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Benchmark, Tabular Models, Text Embeddings, Feature Selection, Dimensionality Reduction
TL;DR: We introduce a new benchmark for text in tabular data, evaluate embedding techniques, highlight their limitations, and analyze performance.
Abstract: Foundation models for tabular data are rapidly evolving, with increasing interest in extending them to support additional modalities such as free-text features. However, existing benchmarks for tabular data rarely include textual columns, and identifying real-world tabular datasets with semantically rich text features is non-trivial. We propose a series of simple yet effective ablation-style strategies for incorporating text into conventional tabular pipelines. Moreover, we benchmark how state-of-the-art tabular foundation models can handle textual data by manually curating a collection of real-world tabular datasets with meaningful textual features. Our study is an important step towards improving benchmarking of foundation models for tabular data with text.
Submission Number: 103
Loading