Keywords: Relational deep learning, graph learning, tabular learning
Abstract: Much of the world’s most valuable data is stored in relational databases, where data is organized into tables connected by primary-foreign key relationships. Building machine learning models on this data is challenging because existing algorithms cannot directly learn from multiple connected tables. Current methods require manual feature engineering, which involves joining and aggregating tables into a single format, a labor-intensive and error-prone process.
We introduce RDL, an end-to-end learning framework that eliminates the need for manual feature engineering by representing relational databases as temporal, heterogeneous graphs. In this representation, rows become nodes, and primary-foreign key links define edges. Graph Neural Networks (GNNs) are then used to learn representations from all available data.
We benchmark RDL on RelBench, evaluating 30 predictive tasks across seven relational databases, and demonstrate superior performance compared to traditional methods. In a user study, RDL significantly outperforms an experienced data scientist’s manual feature engineering approach, reducing human effort by more than 90%. These results highlight the potential of deep learning for predictive tasks in relational databases.
Submission Number: 27
Loading