Sparse Feature Routing for Tabular Learning

Sparse Feature Routing for Tabular Learning

ICLR 2026 Conference Submission20814 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Sparsity, feature experts, tabular representation learning

TL;DR: We show that decomposing tabular learning by learning a sparse, instance-wise selection over independent, per-feature experts is a simple and powerful foundation for building accurate and transparent models.

Abstract: The landscape of high-performance tabular learning is defined by a difficult compromise between the opaque ensembles of gradient-boosted trees and deep models that rely on elaborate pre-training to adapt ill-suited, monolithic backbones. We argue this compromise stems from a fundamental architectural mismatch. We propose a more principled path forward with a decomposed architecture that performs instance-wise selection over independent feature experts. Our model, the Sparse Feature Routing Network (SFR Net), assigns a small expert to each feature and uses a sparse router to dynamically compose expert results into an instance-specific representation, while a low-rank module captures higher-order interactions. This design yields native instance-level attributions and remains computationally efficient. A comprehensive empirical study validates these advantages. Across diverse benchmarks, SFR Net consistently outperforms strong specialized baselines, including Transformer-based models. Furthermore, it remains highly competitive with powerful self-supervised learning methods, despite being trained end-to-end without the pre-training step. Our ablation studies rigorously quantify the contribution of each architectural module, proving that the performance gains stem from the principled decomposition and dynamic routing, not brute-force capacity. These results position Sparse Feature Routing as a transparent, efficient, and powerful foundation for deep tabular learning.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 20814

Loading