Are LLMs good at structured outputs? A benchmark for evaluating structured output capabilities in LLMs
Abstract: Highlights•Introducing a novel benchmark for assessing the ability of LLMs to produce structured outputs.•Presenting a theoretical foundation by analyzing prompt structures and causal graph analysis.•We Develop the SoEval dataset for 20 different subject areas.•We Evaluate major LLMs like GPT-4 on the SoEval benchmark and set a baseline.
Loading