Exploring Minimum Bayes Risk Decoding for Text-to-SQL Ensemble

Exploring Minimum Bayes Risk Decoding for Text-to-SQL Ensemble

ICLR 2026 Conference Submission20377 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Natural Language Processing, Text-to-SQL, Large language models, Ensemble, Minimum Bayes Risk

Abstract: The task of translating natural language into SQL (NL2SQL or text-to-SQL) enables users to query relational databases without requiring SQL expertise. Although recent large language model (LLM) approaches have advanced the field, achieving robust performance continues to depend on ensemble methods. Existing heuristic-based ensembles such as Minimum Bayes Risk (MBR) and Model-Based MBR (MBMBR) either ignore model-predicted probabilities or allow low-probability candidates to dominate the selection process, and they suffer from prompt sensitivity when estimating candidate likelihoods. We propose a novel heuristic-based ensemble method that directly incorporates each candidate’s own probability into its heuristic score while mitigating prompt sensitivity through marginal probability estimation across diverse prompts. This formulation both improves traditional MBR and stabilizes probability estimation, enabling more accurate and higher-performing candidate selection without the computational overhead of supervised or prompt-based ensembles. Extensive experiments on the SPIDER and BIRD benchmarks demonstrate that our approach consistently outperforms state-of-the-art heuristic methods, achieving higher execution accuracy across fine-tuned and pretrained LLMs. Ablation studies confirm that both the probabilistic scoring function and the marginal probability estimation independently contribute to performance gains, with the full method delivering the strongest results. Our findings establish a new state of the art for heuristic-based ensembles in NL2SQL and highlight the broader potential of probability-aware ensemble strategies for natural language generation tasks.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 20377

Loading