TabGraphs: new benchmark and insights for learning on graphs with tabular features

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: graph benchmarks, tabular machine learning, graph machine learning, tabular features, graph neural networks
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We introduce a new benchmark of graph datasets with heterogeneous tabular features, evaluate various tabular and graph machine learning methods and provide insights for researchers and practitioners.
Abstract: The field of tabular machine learning is very important for industry and science. Table rows are typically treated as independent data samples. However, often additional information about the relations between these samples is available, and leveraging this information may improve the predictive performance. As such relational information can be naturally modeled with a graph, the field of tabular machine learning can borrow methods from graph machine learning. However, graph models are typically evaluated on datasets with homogeneous features, such as word embeddings or bag-of-words representations, which have little in common with the heterogeneous mixture of numerical and categorical features distinctive for tabular data. Thus, there is a critical difference between the data used in tabular and graph machine learning studies, which does not allow us to understand how successfully graph methods can be transferred to tabular data. In this work, we aim to bridge this gap. First, we create a benchmark of diverse graphs with heterogeneous tabular node features and realistic prediction tasks. Further, we evaluate a vast set of methods on this benchmark, analyze their performance, and provide insights and tips for researchers and practitioners in both tabular and graph machine learning fields.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3364
Loading