Keywords: KGE, KG, Knowledge Graph, Knowledge Graph Embeddings, Behavioral Testing, Rotate, Distmult, Complex, HAKE, HyperKG, LineaRE, symmetry, hierarchy
TL;DR: We introduce behavioral testing for Knowledge Graph Embedding models
Abstract: Knowledge graph embedding (KGE) models are often used to encode knowledge graphs in order to predict new links inside the graph. The accuracy of these methods is typically evaluated by computing an averaged accuracy metric on a held-out test set. This approach, however, does not allow the identification of \emph{where} the models might systematically fail or succeed. To address this challenge, we propose a new evaluation framework that builds on the idea of (black-box) behavioral testing, a software engineering principle that enables users to detect system failures before deployment.
With behavioral tests, we can specifically target and evaluate the behavior of KGE models on specific capabilities deemed important in the context of a particular use case. To this end, we leverage existing knowledge graph schemas to design behavioral tests for the link prediction task. With an extensive set of experiments, we perform and analyze these tests for several KGE models. Crucially, we for example find that a model ranked second to last on the original test set actually performs best when tested for a specific capability. Such insights allow users to better choose which KGE model might be most suitable for a particular task. The framework is extendable to additional behavioral tests and we hope to inspire fellow researchers to join us in collaboratively growing this framework.
Subject Areas: Knowledge Representation, Semantic Web and Search
Archival Status: Archival
7 Replies
Loading