Reproducibility and Geometric Intrinsic Dimensionality: An Investigation on Graph Neural Network Research.

TMLR Paper2366 Authors

11 Mar 2024 (modified: 04 Jul 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Difficulties in replication and reproducibility of empirical evidences in machine learning research have become a prominent topic in recent years. Ensuring that machine learning research results are sound and reliable requires reproducibility, which verifies the reliability of research findings using the same code and data. This promotes open and accessible research, robust experimental workflows, and the rapid integration of new findings. Evaluating the degree to which research publications support these different aspects of reproducibility is one goal of the present work. For this we introduce an ontology of reproducibility in machine learning and apply it to methods for graph neural networks. The objective of this study is to try and identify hidden effects that influence model performance. To this end, we employ the aforementioned ontology to control for a broad selection of sources and turn our attention to another critical challenge in machine learning. The curse of dimensionality, which induces complication in data collection, representation, and analysis, makes it harder to find representative data and impedes the training and inference processes. The closely linked concept of geometric intrinsic dimension is employed to investigate the extent to which the machine learning models under consideration are influenced by the intrinsic dimension of the data sets on which they are trained.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: responded to the reviewers' comments and revised the text
Assigned Action Editor: ~Jean_Barbier2
Submission Number: 2366
Loading