Cracking SQL Barriers: An LLM-based Dialect Translation System

Published: 2025, Last Modified: 05 Nov 2025Proc. ACM Manag. Data 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Automatic dialect translation reduces the complexity of database migration, which is crucial for applications interacting with multiple database systems. However, rule-based translation tools (e.g., SQLGlot, jOOQ, SQLines) are labor-intensive to develop and often (1) fail to translate certain operations, (2) produce incorrect translations due to rule deficiencies, and (3) generate translations compatible with some database versions but not the others.In this paper, we investigate the problem of automating dialect translation with large language models (LLMs). There are three main challenges. First, queries often involve lengthy content (e.g., excessive column values) and multiple syntax elements that require translation, increasing the risk of LLM hallucination. Second, database dialects have diverse syntax trees and specifications, making it difficult for cross-dialect syntax matching. Third, dialect translation often involves complex many-to-one relationships between source and target operations, making it impractical to translate each operation in isolation. To address these challenges, we propose an automatic dialect translation system CrackSQL. First, we propose Functionality-based Query Processing that segments the query by functionality syntax trees and simplifies the query via (i) customized function normalization and (ii) translation-irrelevant query abstraction. Second, we design a Cross-Dialect Syntax Embedding Model to generate embeddings by the syntax trees and specifications (of certain version), enabling accurate query syntax matching. Third, we propose a Local-to-Global Dialect Translation strategy, which restricts LLM-based translation and validation on operations that cause local failures, iteratively extending these operations until translation succeeds. Experiments show CrackSQL significantly outperforms existing methods (e.g., by up to 77.42%). The code is available at https://github.com/weAIDB/CrackSQL.
Loading