A Skill-Packaged Evaluator for Production Text2SQL Agents

Published: 15 May 2026, Last Modified: 25 May 2026AgentSkills 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Text2SQL, agent skills, evaluation, safety
TL;DR: A reusable Skill for evaluating production Text2SQL agents with routing, structural, execution, diagnostic, and safety-aware signals.
Abstract: Text2SQL evaluation is often reduced to benchmark scoring over generated SQL, but production database agents also route requests, select execution surfaces, condition on runtime context, execute queries, and expose diagnostics. We present a Text2SQL Evaluation Skill that packages this workflow as reusable procedural knowledge. Given natural-language questions, reference SQL, expected route metadata, and database access, the Skill runs an agent or consumes predictions, executes reference and predicted queries, and reports routing, SQL structural, execution, and diagnostic evidence. An anonymized production case study and a portable SQLite demo show that structural and execution signals can materially disagree, motivating multi-signal evaluation with explicit adapter, comparison, and safety policies.
Presentation Mode: Yes, at least one author will attend and present in person.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 79
Loading