IMDb30: A Multi-relational Knowledge Graph Dataset of IMDb Movies

Wenying Feng, Daren Zha, Lei Wang, Xiaobo Guo

Published: 2022, Last Modified: 05 Nov 2025KSEM (1) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Most knowledge graph embedding (KGE) models are trained and evaluated through common benchmark datasets such as WN18 and FB15k. However, these datasets belong to the general filed and have been utilized as link prediction benchmarks for many years. In addition, some of them suffer from test leakage, thus cannot evaluate KGE models effectively. To provide a new link prediction benchmark of field-specific knowledge graph without test leakage, we proposed a new dataset called IMDb30, which incorporate knowledge of IMDb (Internet Movie Database) movies. We construct IMDb30 based on the public relational data released on IMDb website. The complete IMDb30 contains more than 6 million triplets, and a subset of IMDb30 is also constructed to conduct experiments. IMDb30 subset contains 115080 triplets formed by 31343 entities and 30 relations. We conduct link prediction experiments for 3 convolutional neural network models of KGE on the subset and the results show that IMDb30 can effectively train and evaluate KGE models. The complete dataset and the construction process are made publicly available.

External IDs:dblp:conf/ksem/FengZWG22