Keywords: Incremental Schema Discovery, RDF Data, Big Data, Clustering
Abstract: The lack of a descriptive schema for an RDF dataset has motivated several research works addressing the problem of automatic schema discovery. The goal of these approaches is to provide a structural schema of a given RDF dataset from the existing instances. However, as new instances are added, the generated schema may become inconsistent with the dataset. It is therefore necessary to incrementally update the schema according to the changes occurring in the dataset over time. In this paper, we propose an incremental schema discovery approach for massive RDF data. It is based on a scalable and incremental density-based clustering algorithm which propagates the changes occurring in the dataset into the clusters corresponding to the classes of the schema. Our approach is implemented using big data technologies to scale-up to massive data, while providing a high quality clustering result. We present some experiments which demonstrate the efficiency of our proposal on both synthetic and real datasets.
First Author Is Student: Yes
Subtrack: Semantic Data Management, Querying and Distributed Data