Differential Optimization Testing of Gremlin-Based Graph Database Systems

Yingying Zheng, Wensheng Dou, Lei Tang, Ziyu Cui, Jiansen Song, Ziyue Cheng, Wei Wang, Jun Wei, Hua Zhong, Tao Huang

Published: 01 Jan 2024, Last Modified: 13 Nov 2024ICST 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Graph database systems (GDBs) allow efficiently creating, modifying, and retrieving graph data in a graph database. To accelerate graph queries, GDBs usually adopt various and complex optimization strategies. However, incorrect optimizations in GDBs can introduce optimization bugs, which cause a graph query to compute an incorrect query result, e.g., omitting a vertex in a graph database. In this paper, we propose Differential Optimization Testing (DOT), an effective and automated approach to detect optimization bugs in GDBs that adopt Gremlin as their query language. The main idea of DOT is that, given a Gremlin query $Q$ , we execute it on the target GDB with two different optimization configurations and then verify whether they can compute the same query results for query $Q$ . Any inconsistency between their query results indicates an optimization bug in the target GDB. To improve the efficiency of differential testing in DOT, we further propose an optimization-guided approach, aiming to explore more optimization strategies and more graph database features. We evaluate DOT on six popular and widely-used GDBs, i.e., Neo4j, OrientDB, JanusGraph, HugeGraph, TinkerGraph, and ArcadeDB. In total, we have found 28 unique optimization bugs, 16 of which have been confirmed as previously-unknown bugs.