JSONSchemaBench: A Rigorous Benchmark of Structured Outputs for LLMs

JSONSchemaBench: A Rigorous Benchmark of Structured Outputs for LLMs

ACL ARR 2025 February Submission710 Authors

10 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Reliably generating structured outputs has become a critical capability for modern language model (LM) applications. Constrained decoding has emerged as the dominant technology across sectors for enforcing structured outputs during generation. Despite its growing adoption, little has been done with the systematic evaluation of the behaviors and performance of constrained decoding. Constrained decoding frameworks have standardized around JSON Schema as a structured data format, with most uses guaranteeing constraint compliance given a schema. However, there is poor understanding of the effectiveness of the methods in practice. We present an evaluation framework to assess constrained decoding approaches across three critical dimensions: efficiency in generating constraint-compliant outputs, coverage of diverse constraint types, and quality of the generated outputs. To facilitate this evaluation, we introduce JSONSchemaBench, a benchmark for constrained decoding comprising 10K real-world JSON schemas that encompass a wide range of constraints with varying complexity. We find that JSONSchemaBench presents a significant challenge for both LLMs and constrained decoding frameworks, highlighting ample room for improvement and exposing gaps in the existing solutions.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking, evaluation

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: English

Submission Number: 710

Loading