Keywords: Reinforcement Learning, LLMs, Text-to-SQL, Text-to-Cypher, Structural Transfer
TL;DR: Our research shows that joint training of language models on Text-to-SQL and Text-to-Cypher tasks with reinforcement learning improves performance through structural transfer, enhancing accuracy in both tasks and related structured data reasoning.
Abstract: Large language models (LLMs) can parse natural language into SQL or Cypher, but remain fragmented—lacking a unified ability to reason across both relational and graph-structured data. We present STRuCT-LLM, a reinforcement learning framework for cross-domain query understanding. Our approach integrates supervised chain-of-thought traces with topology-aware execution rewards, enabling models to acquire complementary reasoning skills: computational and inter-column analysis from SQL and graph traversal from Cypher. On noisy real-world datasets (e.g., SEDE), STRuCT-LLM achieves consistent gains over supervised fine-tuning baselines, including ~17\% fewer logical errors and ~20\% fewer data-reference errors, while maintaining robustness under perturbations. Beyond benchmark improvements, we provide a structural analysis of SQL–Cypher equivalence and qualitative case studies showing how unified training resolves errors that single-domain models cannot. These results establish reinforcement learning as a driver of structure-aware generalization across heterogeneous data modalities, paving the way for natural language interfaces to more diverse and unified database systems.
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 19782
Loading