Towards Robustness of Large Language Models on Text-to-SQL Task: An Adversarial and Cross-Domain Investigation

Published: 2023, Last Modified: 07 Jan 2026ICANN (5) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recent advances in large language models (LLMs) like ChatGPT have led to impressive results on various natural language processing (NLP) challenges including text-to-SQL task, which aims to automatically generate SQL queries from natural language questions. However, these language models are still subject to vulnerabilities such as adversarial attacks, domain shift and lack of robustness, which can greatly affect their performance and reliability. In this paper, we conduct a comprehensive evaluation of large language models, such as ChatGPT, on their robustness in text-to-SQL tasks. We assess the impact of adversarial and domain generalization perturbations on LLMs using seven datasets, five of which are popular robustness evaluation benchmarks for text-to-SQL tasks and two are synthetic adversarial datasets generated by ChatGPT. Our experiments show that while LLMs exhibit promise as zero-shot text-to-SQL parsers, their performances degrade under adversarial and domain generalization perturbations, with varying degrees of robustness depending on the type and level of perturbations applied. We also explore the impact of usage-related factors such as prompt design on the performance and robustness of LLMs. Our study provides insights into the limitations and potential directions for future research to enhance the performance and robustness of LLMs on text-to-SQL and other NLP tasks.
Loading