A Query-Aware Enormous Database Generator For System Performance Evaluation

Published: 2025, Last Modified: 11 Nov 2025SIGMOD Conference Companion 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In production, simulating the real application without exposing the privacy data is essential for database benchmarking or performance debugging. A rich body of query-aware database generators (QAG) are proposed for this purpose. The complex data dependencies hidden behind queries make previous work suffer from critical deficiencies in supporting complex operators with high simulation accuracy. To fill the gap between the existing QAGs and the urgent demands, we implement a data generator Mirage with the attractive characteristics of reproducing applications based on the queries even with complex operators and having a theoretical zero error. Specifically,Mirage leverages Query Rewriting and Set Transforming Rules to decouple the implicit dependencies from queries, which greatly simplify the generation problem; it presents a uniform representation of various join types and formulates key population as a Constraint Programming (CP) problem, which can be well solved by an off-the-shelf CP Solver. In this demonstration, users can explore the core features of Mirage in generating synthetic databases, which has the widest support to operators and the best simulation fidelity compared to the related work.
Loading