End-to-End Compression for Tabular Foundation Models

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 spotlightEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The long-standing dominance of gradient-boosted decision trees for tabular data has recently been challenged by in-context learning tabular foundation models. In-context learning methods fit and predict in one forward pass without parameter updates by leveraging the training data as context for predicting on query test points. While recent tabular foundation models achieve state-of-the-art performance, their transformer architecture based on the attention mechanism has quadratic complexity regarding dataset size, which in turn increases the overhead on training and inference time, and limits the capacity of the models to handle large-scale datasets. In this work, we propose TACO, an end-to-end tabular compression model that compresses the training dataset in a latent space. We test our method on the TabArena benchmark, where our proposed method is up to 94x faster in inference time, while consuming up to 97% less memory compared to the state-of-the-art tabular Transformer architecture, all while retaining performance without significant degradation. Lastly, our method not only scales better with increased dataset sizes, but it also achieves better performance compared to other baselines.
Lay Summary: Tabular foundation models are a recent class of AI that make predictions on data organized in rows and columns. They work through in-context learning: rather than being retrained for each new task, the model conditions on a set of training examples and produces predictions in a single pass. However, these models are built on the transformer architecture, whose computational and memory costs grow quadratically with the number of training examples. This limits their use on large datasets and makes inference slow and memory intensive. We propose TACO, which addresses this bottleneck by compressing the training data into a compact latent representation before prediction. The compressor and predictor are trained jointly, end to end, so the compressed representation retains the information most useful for accurate prediction. Across benchmark datasets, TACO achieves up to a 94-fold speedup in inference and up to 98% lower memory use, with no statistically significant loss in accuracy. This enables tabular foundation models to scale to datasets of millions of rows, previously beyond their reach.
Link To Code: https://github.com/machinelearningnuremberg/TACO
Primary Area: Deep Learning->Foundation Models
Keywords: tabular data, tabular compression
Originally Submitted PDF: pdf
Submission Number: 9955
Loading