$\texttt{OmniOData}$: Unleashing Small Language Models for OData Query Generation with Synthetic Data and Reinforcement Learning

Published: 18 Apr 2026, Last Modified: 22 Apr 2026ACL 2026 Industry Track PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: text-toOData; Dataset synthesis; Reinforcement Learning
Abstract: Despite the success of Large Language Models (LLMs) in structured query generation, OData—a critical RESTful protocol for enterprise APIs—remains under-researched due to a lack of high-fidelity, execution-validated datasets. To bridge this gap, we introduce \textsc{OmniOData}, a framework that generates \textsc{SynOData}, the first large-scale OData corpus featuring execution-grounded queries and reasoning traces. Using this corpus, we develop \textsc{OmniOData-R1} (1.5B–3B parameters), a family of models that match or surpass frontier proprietary systems, such as GPT-4o and Gemini 3, on realistic industrial benchmarks. Our results demonstrate that the synergy of execution-verified synthetic data and Reinforcement Learning (RL) effectively unlocks the latent reasoning of Small Language Models (SLMs), providing a high-performance, low-latency solution for specialized enterprise query generation. The code and data will be released under an open-source license.
Submission Type: Deployed
Copyright Form: pdf
Submission Number: 418
Loading