Keywords: relational foundation models, pretraining, relational databases
Abstract: Few-shot adaptation has been crucial to the success of foundation models for language and vision, but foundation models for structured relational data still require thousands of labeled examples to perform well. We show that, when
pretrained at scale with the right recipe, a Relational Transformer (RT) becomes a strong few-shot predictor across diverse databases. Our
recipe has three ingredients: THE JOIN, the largest
open relational pretraining corpus to date, with
6,255 forecasting tasks across 650 real-world
databases; a pretraining procedure that mixes context sizes, masks many cells per window, and
fills the context via random-walk retrieval; and
test-time compute scaling via context ensembling
and per-task context tuning. On RelBench, the
resulting model matches the strongest in-context
learning pipelines (RDBLearn + TabICLv2 and
an LLM Agent + TabICLv2) using 23–32× fewer
in-context labels, and exceeds the prior fully-supervised state of the art (RelGNN) without
any task-specific training. Ablations show that
schema semantics, multi-cell masking, random-walk retrieval, and mixed task pretraining each
contribute materially to the regime.
Submission Number: 147
Loading