Abstract: Shifting from Big Data to Big Knowledge requires systems that are able to cope with the large volume and high-velocity dimensions in a scalable and inference-enabled manner. In this work, we are focusing on stream processing and reasoning using the graph-based RDF data model. We are aiming to explore the ability of modern distributed computing frameworks to process highly expressive knowledge inference queries over Big Data streams. To do so, we consider queries expressed as a positive fragment of a temporal logic framework based on Answer Set Programming and propose solutions to process such queries, based on the two main execution models adopted by major parallel and distributed execution frameworks: Bulk Synchronous Parallel (BSP) and Recordat-A-Time (RAT). We implement our solution named BigSR and conduct a series of experiments with 15 queries from 4 different datasets. Our experiments show that BigSR achieves high throughput beyond million-triples per second using a rather small cluster of machines.
Loading