$\texttt{OmniOData}$: Unleashing Small Language Models for OData Query Generation with Synthetic Data and Reinforcement Learning
Keywords: text-toOData; Dataset synthesis; Reinforcement Learning
Abstract: Despite the success of Large Language Models (LLMs) in structured query generation, OData—a critical RESTful protocol for enterprise APIs—remains under-researched due to a lack of high-fidelity, execution-validated datasets.
To bridge this gap, we introduce \textsc{OmniOData}, a framework that generates \textsc{SynOData}, the first large-scale OData corpus featuring execution-grounded queries and reasoning traces.
Using this corpus, we develop \textsc{OmniOData-R1} (1.5B–3B parameters), a family of models that match or surpass frontier proprietary systems, such as GPT-4o and Gemini 3, on realistic industrial benchmarks.
Our results demonstrate that the synergy of execution-verified synthetic data and Reinforcement Learning (RL) effectively unlocks the latent reasoning of Small Language Models (SLMs), providing a high-performance, low-latency solution for specialized enterprise query generation.
The code and data will be released under an open-source license.
Submission Type: Deployed
Copyright Form: pdf
Submission Number: 418
Loading