A Large-Scale Database for Graph Representation LearningDownload PDF

Published: 17 Aug 2021, Last Modified: 20 Oct 2024NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone
Keywords: graph representation learning, graph classification, dataset, database, graphs
TL;DR: A large-scale graph representation learning database offering over 1.2 million graphs, averaging 15k nodes and 35k edges per graph
Abstract: With the rapid emergence of graph representation learning, the construction of new large-scale datasets are necessary to distinguish model capabilities and accurately assess the strengths and weaknesses of each technique. By carefully analyzing existing graph databases, we identify 3 critical components important for advancing the field of graph representation learning: (1) large graphs, (2) many graphs, and (3) class diversity. To date, no single graph database offers all of these desired properties. We introduce MalNet , the largest public graph database ever constructed, representing a large-scale ontology of malicious software function call graphs. MalNet contains over 1.2 million graphs, averaging over 15k nodes and 35k edges per graph, across a hierarchy of 47 types and 696 families. Compared to the popular REDDIT-12K database, MalNet offers 105x more graphs, 44x larger graphs on average, and 63x more classes. We provide a detailed analysis of MalNet, discussing its properties and provenance, along with the evaluation of state-of-the-art machine learning and graph neural network techniques. The unprecedented scale and diversity of MalNet offers exciting opportunities to advance the frontiers of graph representation learning--enabling new discoveries and research into imbalanced classification, explainability and the impact of class hardness. The database is publicly available at www.mal-net.org.
Supplementary Material: zip
URL: https://mal-net.org/
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/a-large-scale-database-for-graph/code)
10 Replies

Loading