Keywords: Python, scikit-learn, Knowledge Graph, Background Knowledge, Data Mining
TL;DR: The Python kgextension package allows for using background knowledge from public knowledge graphs in scikit-learn pipelines.
Abstract: Python is currently the most used platform for data science and machine learning. At the same time, public knowledge graphs have been identified as a valuable source of background knowledge in many data science tasks. In this paper, we introduce the kgextension package for Python, which allows for using knowledge graph in data science pipelines built in Python. The demo shows how data from public knowledge graphs such as DBpedia and Wikidata can be used in data mining pipelines based on the popular Python package scikit-learn. We demonstrate the package's utility by showing that the prediction accuracy on a popular Kaggle task can be significantly increased by using background knoweldge from DBpedia.