Sparse Feature Routing for Tabular Learning

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Sparsity, feature experts, tabular representation learning
TL;DR: We show that decomposing tabular learning by learning a sparse, instance-wise selection over independent, per-feature experts is a simple and powerful foundation for building accurate and transparent models.
Abstract: The landscape of high-performance tabular learning is often framed as a choice between the efficiency of gradient-boosted trees and the performance of deep architectures, which increasingly rely on heavy, monolithic backbones to model feature interactions. We argue that this monolithic design overlooks a critical inductive bias: the inherent sparsity and modularity of tabular data. To address this, we introduce the Sparse Feature Routing Network (SFR Net), an architecture that decomposes computation into independent feature experts controlled by an entropy-regularized router, coupled with a low-rank module to capture non-additive dependencies. We evaluate SFR Net across 14 heterogeneous benchmarks, including standard datasets, high-dimensional multiclass tasks, and regression problems. Empirically, SFR Net demonstrates predictive performance competitive with, and often superior to, state-of-the-art deep tabular models and gradient-boosted ensembles. Beyond raw performance, SFR Net offers three distinct structural advantages: (1) efficiency, requiring up to $24\times$ fewer parameters and training $30\times$ faster than tabular Transformers; (2) intrinsic sparsity, dynamically activating only a small fraction of features per instance; and (3) faithful interpretability, where deletion tests confirm that the learned routing weights serve as reliable, causal instance-level attributions. These results position sparse feature routing as a lightweight, transparent, and high-performance alternative to dense tabular foundation models.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 20814
Loading