Abstract: Finding the optimal join order for a multijoin query is an old, yet very important topic for relational database systems. It has been studied for the last few decades and proven to be NP-hard. The mainstream techniques, first proposed in System R, are based on dynamic programming. These techniques are widely adopted by commercial database systems. However, it is well known that such approaches suffer from exponential running time in finding the optimal join order for most queries, except simple ones like linear queries. Therefore, a query optimizer must resort to finding a suboptimal join order when the number of tables is large. This paper proposes SAM, which departs from current practice in two ways: (1) SAM orders the joining attributes before ordering the tables; (2) SAM sorts the tables by comparing selectivities for “table blocks”. This approach reduces the exponential time complexity in the optimization; in particular, it can find, in polynomial time, the optimal ordering for clique queries that take exponential time to optimize by dynamic programming. Experiments comparing SAM to the query optimizers in MySQL and PostgreSQL, using real data, show that its performance is similar for small queries, but much better for large queries.
0 Replies
Loading