Abstract: This chapter presents novel solutions for storage and querying of large knowledge graphs, represented in RDF, which consist of mobility data. Such knowledge graphs are generated and updated daily based on incoming positional information of moving entities, possibly linked with contextual information and weather data. To cope with the massive size of knowledge graphs, several challenges need to be addressed related to distributed storage and parallel query processing. This chapter presents the design and implementation of a parallel processing engine for spatiotemporal RDF data built on top of Apache Spark. The engine is comprised of a storage layer, which stores deliberately encoded spatiotemporal RDF triples and a dictionary of mappings between integer identifiers and RDF resources, and also uses Property tables and columnar storage layout for improved performance. Also, the engine uses a processing layer, which is comprised by a query parsing component, a logical query builder, and a physical query constructor in order to produce execution plans that efficiently handle spatiotemporal constraints along with SPARQL processing. The performance of our engine is demonstrated by means of experiments over large knowledge graphs of real-life mobility data.
Loading