JoinGym: An Efficient Join Order Selection Environment

Junxiong Wang; Kaiwen Wang; Yueying Li; Nathan Kallus; Immanuel Trummer; Wen Sun

JoinGym: An Efficient Join Order Selection Environment

Junxiong Wang, Kaiwen Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun

Published: 15 May 2024, Last Modified: 14 Nov 2024RLC 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning Environment, Database Query Optimization, Join Order Selection

TL;DR: We present JoinGym, a lightweight and realistic join order selection environment for RL.

Abstract: Join order selection (JOS), the ordering of join operations to minimize query execution cost, is a core NP-hard combinatorial optimization problem in database query optimization. We present \textsc{JoinGym}, a lightweight and easy-to-use reinforcement learning (RL) environment that captures both left-deep and bushy variants of the JOS problem. Compared to prior works that execute queries online, \textsc{JoinGym} has much higher throughput and efficiently simulates the cost of joins offline by looking up the intermediate table's cardinality from a pre-computed dataset. We provide such a cardinality dataset for $3300$ queries based on real IMDb workloads, which is the largest suite its kind and may be of independent interest. We extensively benchmark several RL algorithms and find that the best policies are competitive with or better than Postgres, a strong non-learning baseline. However, the learned policies can still catastrophically fail on a small fraction of queries which motivates future research using \textsc{JoinGym} to improve generalization and safety in long-tailed, partially observed, combinatorial optimization problems.

Submission Number: 14

Loading