Keywords: relational databases, in-context learning, foundation models, prior-fitted networks
TL;DR: We motivate, develop, and analyze an ICL-based foundation model for predictive modeling on relational databases without any pre-training data collection requirements.
Abstract: Relational databases (RDBs) contain vast amounts of heterogeneous tabular information that can be exploited for predictive modeling purposes. Of course the space of potential targets is vast across enterprise settings, so it is preferable to avoid learning a new model each time there is a new estimation task. Foundation models based on in-context learning (ICL) offer a convenient option, but so far are mostly restricted to single-table operability, a presumed impediment being the difficulty in collecting or generating adequate RDB pre-training data. But is there any practical way around this bottleneck? We answer in the affirmative by demonstrating how already-existing single-table foundation models can be repurposed for RDBs when combined with a suitable class of relational encoder, such that no further pre-training or data collection is even required. This is possible because theoretical and empirical evidence suggests that ICL-specific encoder compression of variably-sized RDB neighborhoods should be constrained within high-dimensional RDB columns where all entities share units and roles, not across columns where the relevance of heterogeneous data types cannot be determined without label information. And conditioned on this particular restriction, encoder expressiveness is not compromised by excluding learnable parameters that would otherwise necessitate RDB data collection for pre-training. Practically, we develop scalable SQL primitives to implement the encoder stage within an open-source toolbox that achieves SOTA performance on new RDBs out of the box.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 86
Loading