Broadening the Scope of Graph Regression: Introducing A Novel Dataset with Multiple Representation Settings

Broadening the Scope of Graph Regression: Introducing A Novel Dataset with Multiple Representation Settings

TMLR Paper3937 Authors

10 Jan 2025 (modified: 25 Mar 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Graph regression is a vital task across various domains, however, the majority of publicly available datasets for graph regression are concentrated in the fields of chemistry, drug discovery, and bioinformatics. This narrow focus on dataset availability restricts the development and application of predictive models in other important areas. Here, we introduce a novel graph regression dataset tailored to the domain of software performance prediction, specifically focusing on estimating the execution time of source code. Accurately predicting execution time is crucial for developers, as it provides early insights into the code's complexity. Furthermore, it also facilitates better decision-making in code optimization and refactoring processes. Source code can be represented syntactically as trees and semantically as graphs, capturing the relationships between different code components. In this work, we integrate these two perspectives to create a unified graph representation of source code. We present two versions of the dataset: RelSC (Relational Source Code), which incorporates node features, and Multi-RelSC (Multi-Relational Source Code), which treats the graphs as multi-relational, allowing nodes to be connected by multiple edges, each representing a distinct semantic relationship. Finally, we apply various Graph Neural Network models to assess their performance in this relatively unexplored task. Our findings demonstrate the potential of these datasets to advance the field of graph regression, particularly in the context of software performance prediction.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Christopher_Morris1

Submission Number: 3937

Loading