Revisiting Evaluation of Knowledge Base Completion Models

Feb 14, 2020 Blind Submission readers: everyone Show Bibtex
  • Keywords: Knowledge Graph Completion, Link prediction, Calibration, Triple Classification
  • TL;DR: We study the shortcomings of link prediction evaluation and provide a new task based on triple classification
  • Subject Areas: Knowledge Representation, Semantic Web and Search, Information Extraction, Machine Learning
  • Abstract: Representing knowledge graphs (KGs) by learning embeddings for entities and relations has provided accurate models for existing KG completion benchmarks. Although extensive research has been carried out on KG completion, because of the open-world assumption of existing KGs, previous studies rely on ranking metrics and triple classification with negative samples for the evaluation and are unable to directly assess the models on the goals of the task, completion. In this paper, we first study the shortcomings of these evaluation metrics. More specifically, we demonstrate that these metrics 1) are unreliable for estimating calibration, 2) make strong assumptions that are often violated, and 3) do not sufficiently, and consistently, differentiate embedding methods from simple approaches and from each other. To address these issues, we provide a semi-complete KG using a randomly sampled subgraph from the test and validation data of YAGO3-10, allowing us to compute accurate triple classification accuracy on this data. Conducting thorough experiments on existing models, we provide new insights and directions for the KG completion research.
  • Archival Status: Archival
0 Replies