Structural SQL Similarity via Hybrid Graph Matching and AST-Guided Tree Edit Metrics

Structural SQL Similarity via Hybrid Graph Matching and AST-Guided Tree Edit Metrics

ACL ARR 2025 July Submission1363 Authors

29 Jul 2025 (modified: 26 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Assessing semantic similarity between SQL queries is vital for Text-to-SQL evaluation, query clustering, deduplication, and code auditing. Existing metrics—such as Execution Accuracy and CodeBERTScore—either require database access or rely on token-level similarity, overlooking deeper structural and logical equivalence. These limitations hinder their use in schema-less, privacy-sensitive, or real-time settings. In this paper, we propose structure-aware, execution-free evaluation methodologies: \textbf{AST-TE} and \textbf{Hybrid-GMN}, combining symbolic and neural methods. AST-TE computes Zhang–Shasha-style tree-edit distance over normalized SQL Abstract Syntax Trees (ASTs) to capture structural and semantic differences. Hybrid-GMN encodes ASTs and Relational Operator Trees (ROTs) into a heterogeneous graph, enabling fine-grained semantic alignment via cross-graph attention. Experiments on spider, BIRD, and internal subquery datasets generated via a fine-tuned SQLCoder-7B model demonstrate significant performance improvements over existing symbolic and neural baselines. Specifically, on the Spider dataset, our methods surpass the state-of-the-art CrystalBLEU metric by approximately 23\% in ROC AUC and more than 95\% in Spearman correlation. Our findings underscore the limitations of traditional execution-based and token-level metrics, establishing AST-TE and Hybrid-GMN as robust, scalable, and schema-agnostic alternatives for evaluating SQL query equivalence.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: SQL evaluation, execution-free metrics, graph-based similarity, tree-edit distance, AST, ROT, CodeBLEU, CodeBERTScore, symbolic reasoning, neural alignment

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data analysis, Theory

Languages Studied: English

Submission Number: 1363

Loading