TabFlex: Scaling Tabular Learning to Millions with Linear Attention

Yuchen Zeng; Wonjun Kang; Andreas C Mueller

TabFlex: Scaling Tabular Learning to Millions with Linear Attention

Yuchen Zeng, Wonjun Kang, Andreas C Mueller

Published: 10 Oct 2024, Last Modified: 30 Oct 2024TRL @ NeurIPS 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Tabular Classification, Transformer, State-Space Models, Linear Attention, Scalability

TL;DR: TabFlex: A linear attention-based model for scalable tabular classification, outperforming existing methods in speed and maintaining good performance on both small and large datasets.

Abstract: Recent advances in the field of in-context learning (ICL) have demonstrated impressive performance for tabular classification, exemplified by TabPFN's success on small datasets. However, the quadratic complexity of the attention mechanism limits its applicability to larger datasets. To address this issue, we conduct a comprehensive comparison of popular scalable attention alternatives, including state-space models (SSMs) and linear attention mechanisms, revealing that the inherent causality of SSMs hinders ICL performance for large datasets, while linear attention preserves effectiveness. Leveraging these insights, we introduce TabFlex, a model based on linear attention that supports thousands of features and hundreds of classes, capable of handling datasets with millions of samples. Extensive experiments demonstrate that TabFlex is significantly faster than most existing methods while achieving top-two performance on small datasets among 25 baselines, with a 2$\times$ speedup over TabPFN and a 1.5$\times$ speedup over XGBoost. On large datasets, TabFlex remains efficient (e.g., approximately 5 seconds on the `poker-hand` dataset, which consists of millions of samples), while achieving relatively solid performance.

Submission Number: 61

Loading