Submission Type: Short paper (4 pages)
Keywords: tabular foundation models, autoregressive generation, transformers
TL;DR: We propose an architecture extension to existing tabular foundations to generate joint predictive samples ~20x faster with minimal increase in training overhead and drop in performance.
Abstract: Transformer-based tabular foundation models excel at single-pass marginal prediction, yet many applications require coherent joint distributions across predictions. Purely autoregressive architectures capture dependencies but forgo flexible set-conditioning used in meta-learning; deploying set-based models autoregressively forces re-encoding the augmented context at each step. We introduce a causal autoregressive buffer that encodes the context once, caches it, and uses a causal buffer for generated targets. Targets attend to the cache and the visible buffer prefix, enabling efficient batched autoregressive generation and one-pass joint log-likelihoods. A unified training scheme (masked attention with a buffer-size curriculum) covers both modes with minimal overhead. On a small tabular foundation model, the buffer matches joint estimates from existing approaches while delivering up to $20\times$ faster joint sampling.
Relevance Comments: We extend existing architectural approaches for tabular foundation models to enable fast autoregressive generation and likelihood evaluation.
Published Venue And Year: NA
Submission Number: 21
Loading