Optimizing Large Language Models for Robust Domain-Specific Text-to-SQL: From Prompting to Preference Alignment

Noah Hampp; Katya Mirylenka; Michael Glass

Optimizing Large Language Models for Robust Domain-Specific Text-to-SQL: From Prompting to Preference Alignment

Noah Hampp, Katya Mirylenka, Michael Glass

12 Mar 2026 (modified: 19 May 2026)SwissText 2026 Conference SubmissionEveryoneRevisionsCC BY 4.0

Track: Scientific Track

Keywords: Text-to-SQL, Large Language Models, RLAIF, ORPO, Preference Alignment, Prompt Engineering, Constrained Decoding, Robustness

TL;DR: We provide a reproducible pipeline for domain-specific Text-to-SQL, demonstrating that monolithic alignment via ORPO avoids the catastrophic collapse of PPO while offering a low-latency, single-pass alternative to complex agentic workflows.

Abstract: This work explores the optimization of Large Language Models (LLMs) for the task of generating SQL queries from natural language (NL2SQL), a critical capability for democratizing access to domain-specific data. While recent benchmarks show promising results for LLMs, deployment in real-world analytical processing requires strict adherence to SQL grammar, deep domain understanding, and robustness against out-of-scope queries. We present a comprehensive study evaluating three stages of optimization: (1) advanced prompting strategies including Chain-of-Thought and multi-turn conversational handling; (2) constrained decoding to enforce syntactic validity; and (3) Reinforcement Learning with AI Feedback (RLAIF). We specifically compare Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), and Odds Ratio Preference Optimization (ORPO) using a novel reward modeling approach based on execution and semantic principles. Our results reveal that while standard PPO suffers from reward sparsity and catastrophic collapse on 7B models, monolithic alignment via ORPO scales efficiently to 20B parameter models. This provides a stable alternative to expensive inference-time scaling, offering a highly reproducible, single-pass pipeline for adapting open-weights models to complex data environments, serving as a low-latency alternative to agentic systems.

Submission Number: 11

Loading