Relational Deep Learning: Graph Representation Learning on Relational Databases

Joshua Robinson; Rishabh Ranjan; Weihua Hu; Kexin Huang; Jiaqi Han; Alejandro Dobles; Matthias Fey; Jan Eric Lenssen; Yiwen Yuan; Zecheng Zhang; Xinwei He; Jure Leskovec

Relational Deep Learning: Graph Representation Learning on Relational Databases

Joshua Robinson, Rishabh Ranjan, Weihua Hu, Kexin Huang, Jiaqi Han, Alejandro Dobles, Matthias Fey, Jan Eric Lenssen, Yiwen Yuan, Zecheng Zhang, Xinwei He, Jure Leskovec

Published: 10 Oct 2024, Last Modified: 10 Oct 2024TRL @ NeurIPS 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Relational deep learning, graph learning, tabular learning

Abstract: Much of the world’s most valuable data is stored in relational databases, where data is organized into tables connected by primary-foreign key relationships. Building machine learning models on this data is challenging because existing algorithms cannot directly learn from multiple connected tables. Current methods require manual feature engineering, which involves joining and aggregating tables into a single format, a labor-intensive and error-prone process. We introduce RDL, an end-to-end learning framework that eliminates the need for manual feature engineering by representing relational databases as temporal, heterogeneous graphs. In this representation, rows become nodes, and primary-foreign key links define edges. Graph Neural Networks (GNNs) are then used to learn representations from all available data. We benchmark RDL on RelBench, evaluating 30 predictive tasks across seven relational databases, and demonstrate superior performance compared to traditional methods. In a user study, RDL significantly outperforms an experienced data scientist’s manual feature engineering approach, reducing human effort by more than 90%. These results highlight the potential of deep learning for predictive tasks in relational databases.

Submission Number: 27

Loading