Are LLMs good at structured outputs? A benchmark for evaluating structured output capabilities in LLMs

Yu Liu, Duantengchuan Li, Kaili Wang, Zhuoran Xiong, Fobo Shi, Jian Wang, Bing Li, Bo Hang

Published: 01 Jan 2024, Last Modified: 19 Feb 2025Inf. Process. Manag. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•Introducing a novel benchmark for assessing the ability of LLMs to produce structured outputs.•Presenting a theoretical foundation by analyzing prompt structures and causal graph analysis.•We Develop the SoEval dataset for 20 different subject areas.•We Evaluate major LLMs like GPT-4 on the SoEval benchmark and set a baseline.