Self-Spec: Model-Authored Specifications for Reliable LLM Code Generation

Published: 08 Oct 2025, Last Modified: 08 Oct 2025Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, NL-to-Code, Specification-Driven Code Generation.
Abstract: Do large language models (LLMs) code more reliably when they first author a task-specific specification language and then implement strictly from that spec? We introduce Self-Spec, a lightweight, deterministic (T=0) orchestration that prompts a model to (i) design a compact spec schema it prefers, (ii) instantiate that schema from a problem’s docstring and signature, (iii) resolve ambiguities via a minimal Q&A loop, and (iv) generate code only from the confirmed spec. The intuition is distributional: a self-authored spec better aligns with a model’s internal representational bias, reducing docstring drift and format/edge-case mistakes. On HumanEval (pass@1, single sample), Self-Spec improves over direct NL→code for stronger models: GPT-4o 87%→92% (+5) and Claude 3.7 92%→94% (+2); Claude 3.5 dips 90%→89% (-1), which returns to baseline once we remove over-defensive guards in generated code (e.g., replacing raise/assert with no-ops when unspecified). To our knowledge, this is the first systematic study that lets an LLM design its own spec language for coding. The method is simple (no finetuning), model-agnostic (each model chooses its spec shape), and practical (assumptions are made explicit). We release prompts and code for reproduction. Overall, our results show that Self-Spec works in practice and offers strong potential as a general path to more reliable LLM coding via self-authored specifications.
Supplementary Material: zip
Submission Number: 206
Loading