ConstrainPrompt: Code-Based Assurance of Prompt-Defined Constraints

ICLR 2026 Conference Submission11382 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Evaluation methods, Large language models, Prompt analysis
Abstract: Large language models (LLMs) are increasingly used in applications where outputs must satisfy hard, application–critical constraints (e.g., JSON format, lexical inclusion, and length limits). When these constraints are violated, downstream parsers may fail (e.g., invalid JSON), application behavior can become incorrect or unsafe (e.g., missing required strings or forbidden terms), and automation pipelines may halt. Although controlled text generation can mitigate violations, LLM outputs still frequently breach constraints. Therefore, post-generation evaluation is essential. Common evaluators implemented by LLM-as-a-judge or rule-based scripts under-penalize hard errors and lack robust, fine-grained evaluation control flow. We propose ConstrainPrompt, a verification pipeline that induces semantics-agnostic, code-verifiable constraints from natural-language prompts and compiles them into executable validators. Our method extracts code-verifiable constraints from the prompt, synthesizes a logical evaluation tree that orders global-to-local checks and resolves conditional guards, and finally generates code to validate LLM outputs. On a corpus of real-world prompts paired with LLM outputs, ConstrainPrompt improves Constraint Compliance Accuracy by 24.3% and Violation Rationale by 40.8% over an LLM-as-a-judge baseline across three models.
Primary Area: generative models
Submission Number: 11382
Loading