Keywords: RDF compression, Graph compression, Benchmarking
Abstract: Data compression for RDF knowledge graphs is used in an increasing number of settings. In parallel to this, several grammar-based graph compression algorithms have been developed to reduce the size of graphs. We port gRePair—a state-of-the-art grammar-based graph compression algorithm—to RDF (named RDFRePair). We compare this promising technique with the state-of-the-art approaches for RDF compression dubbed HDT, HDT++ and OFR as well as an improved implementation of a $k^2$-trees-based RDF compression. We run an extensive evaluation on 40 datasets. Our results suggest that RDFRePair achieves significantly better compression ratios and runtimes than gRePair. However, it is outperformed by $k^2$ trees, which achieves the overall best compression ratio on real-world datasets. This better performance comes at the cost of time, as $k^2$ trees are clearly outperformed by OFR w.r.t. compression and decompression time. A pairwise Wilcoxon Signed Rank Test suggests that while OFR is significantly more time-efficient than HDT and $k^2$ trees, there is no significant difference between the compression ratios achieved by $k^2$ trees and OFR. In addition, we point out future directions for research. All code and datasets are available at https://github.com/dice-group/GraphCompression and https://hobbitdata.informatik.uni-leipzig.de/rdfrepair/evaluation_datasets/, respectively.
First Author Is Student: No
Subtrack: Knowledge Graphs (understanding, creating, and exploiting)
Negative Results Paper: This is a negative results paper.