Keywords: In-context learning, TabPFN, Mamba, Tabular data
TL;DR: We compared the performance of different auto-regressive architectures in the framework of TabPFN/prior-fitted networks finding that Mamba does not perform as well as Transformer models
Abstract: We explore different auto-regressive model architectures for in-context learning on tabular datasets trained in a similar manner to TabPFN.
Namely, we compare transformer based models with a structured state-space model architecture (Mamba) and a hybrid architecture (Jamba), mixing transformer and Mamba layers.
We find that auto-regressive transformer models perform similarly to the original TabPFN transformer architectures, albeit at the cost of a doubled context length.
Mamba performs worse than similar sized transformer models, while hybrid models show promise in harnessing some advantages of state-space models such as supporting long input context length and fast inference.
Submission Number: 44
Loading