IntelliGraphs: Datasets for Benchmarking Knowledge Graph Generation

Published: 15 Mar 2026, Last Modified: 15 Mar 2026Accepted by DMLREveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Knowledge Graph Embedding (KGE) models are used to learn continuous representations of entities and relations, commonly trained to predict missing links between entities. However, Knowledge Graphs are not just sets of links but also have complex semantics underlying their structure. Semantics plays a crucial role in several downstream tasks, such as query answering and reasoning. Recognizing this, our work goes beyond simple link prediction to focus on inferred knowledge that adheres to rich semantics. Specifically, 1) we introduce the \emph{subgraph inference} task, where a model is required to generate novel subgraphs that are logically consistent with background knowledge; 2) we propose \emph{IntelliGraphs}, a set of five new datasets that contain subgraphs with logical rules that express complex semantics for evaluating subgraph inference models, and 3) we design four baseline models, which include three models based on traditional KGEs, and show empirically that the KGE-based baselines cannot capture complex semantics. We believe that IntelliGraphs will encourage the development of machine learning models that focus on semantic understanding.
Certifications: Dataset Certification, Reproducibility Certification
Keywords: Knowledge Graph, Benchmark Datasets, Subgraph Inference, Semantic evaluation,
Changes Since Last Submission: We have made the following updates to the manuscript: - Fixed the random baseline model: We fixed the mistakes in the random baseline model, resulting in better compression values for the random baseline. - Updated Tables 2 and 3: The numbers reflecting the performance of the random baseline model have been corrected and updated accordingly. - Revised the description of the random baseline: We made necessary adjustments to the main body (Section 4) and the appendix (Section 7.1) to accurately describe the performance and implementation of the improved random baseline.
Changes Since Previous Publication: N/A
Code: https://github.com/thiviyanT/IntelliGraphs
Assigned Action Editor: ~Mykola_Pechenizkiy1
Submission Number: 59
Loading