Compact Approximation of Redundant Blocks in Tabular Foundation Models

07 May 2026 (modified: 09 May 2026)ICML 2026 Workshop CoLoRAI SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Tabular Foundation Models, Model Compression, Transformer Redundancy, Training-Free Compression, Efficient Inference, Linear Approximation
TL;DR: We show that up to 94% of blocks in tabular transformers can be replaced with a closed-form linear translator while largely preserving downstream performance.
Abstract: In-context-learning tabular foundation models (TFMs) are powerful tools for zero-shot tabular tasks, requiring no gradient updates on target data. However, their architectures consist of 12--16 transformer blocks that demand GPU inference, severely limiting their deployment in compute-constrained, on-premise environments. While simpler alternatives like Gradient-Boosted Decision Trees (GBDTs) run efficiently on CPUs, they require manual feature engineering and per-dataset hyperparameter tuning. In this paper, we show that TFMs are vastly over-parametrized and can be radically compressed. By substituting up to $\sim$94\% of the transformer blocks with a closed-form linear translator, we largely preserve downstream performance while requiring minimal compute. We demonstrate this extreme compressibility across eleven diverse datasets including three TabZilla controls and medical datasets (e.g., MIMIC-III and eICU-CRD), spanning binary classification, multi-class classification, and regression tasks. Our findings reveal that the vast majority of TFMs depth is linearly redundant, opening a pathway to lightweight foundation model inference.
Submission Number: 68
Loading