Large-Scale Pretraining unlocks Few-Shot Prediction for Relational Data

Published: 25 May 2026, Last Modified: 29 May 2026FMSD @ ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: relational foundation models, pretraining, relational databases
Abstract: Few-shot adaptation has been crucial to the success of foundation models for language and vision, but foundation models for structured relational data still require thousands of labeled examples to perform well. We show that, when pretrained at scale with the right recipe, a Relational Transformer (RT) becomes a strong few-shot predictor across diverse databases. Our recipe has three ingredients: THE JOIN, the largest open relational pretraining corpus to date, with 6,255 forecasting tasks across 650 real-world databases; a pretraining procedure that mixes context sizes, masks many cells per window, and fills the context via random-walk retrieval; and test-time compute scaling via context ensembling and per-task context tuning. On RelBench, the resulting model matches the strongest in-context learning pipelines (RDBLearn + TabICLv2 and an LLM Agent + TabICLv2) using 23–32× fewer in-context labels, and exceeds the prior fully-supervised state of the art (RelGNN) without any task-specific training. Ablations show that schema semantics, multi-cell masking, random-walk retrieval, and mixed task pretraining each contribute materially to the regime.
Submission Number: 147
Loading