Good-Enough Structured Generation: A Case Study on JSON Schema

Published: 22 Sept 2025, Last Modified: 25 Nov 2025DL4C @ NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: grammar-constrained decoding, structured outputs
Abstract: Grammar-constrained decoding—which masks invalid tokens during generation to guarantee outputs stay within a specified formal language—promises to eliminate structural errors in language model outputs. Yet when tested on JSON Schema (the most common application of grammar-constrained decoding), popular implementations achieve as low as 50\% coverage on real-world schemas. Through experiments on 9,558 real-world JSON schemas, we find that treating validation as an external tool—using validation failures as feedback for runtime alignment—outperforms sophisticated constrained decoding methods, achieving 95\% coverage in exchange for higher latency—typically an additional 1-3 seconds per schema. This gap stems from multiple issues: grammar-constrained decoding is theoretically limited to context-free grammars, real-world schemas often require context-sensitive validation, and even within context-free constraints, implementations struggle with token-boundary misalignment and state explosion. While our analysis focuses specifically on JSON Schema---where language models may excel due to extensive training exposure---it raises questions about whether increasingly complex decoding algorithms are the right approach. As language models improve, treating validation as a separate feedback tool in an agentic loop may prove more practical than embedding constraints into the decoding process itself.
Submission Number: 81
Loading