PreQR: Pre-training Representation for SQL UnderstandingOpen Website

2022 (modified: 17 Nov 2022)SIGMOD Conference 2022Readers: Everyone
Abstract: Recently, the learning-based models are shown to outperform the conventional methods for many database tasks such as cardinality estimation, join order selection and performance tuning. However, most existing learning-based methods adopt the one-hot encoding for SQL query representation, unable to catch complicated semantic context, e.g. structure of query, database schema definition and distribution variance of columns. To address such above problem, we propose a novel pre-trained SQL representation model, called PreQR, which extends the language representation approach to SQL queries. We propose an automaton to encode the query structures, and apply a graph neural network to encode database schema information conditioned on the query. A new SQL encoder is then established by adopting the attention mechanism to support on-the-fly query-aware schema linking. Experimental results on real datasets show that replacing the one-hot encoding with our query representation can significantly improve the performances of existing learning-based models on several database tasks.
0 Replies

Loading