Differentiable Verification for Safe Reinforcement Learning in Verifiable Code Synthesis

ICLR 2026 Conference Submission25549 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Code Synthesis
Abstract: We propose a novel framework for safe reinforcement learning (RL) in verifiable code synthesis where formal verification constraints are integrated in the form of differentiable parts as components in the policy optimization loop. Traditional approaches to verification are seen as a post-hoc filter or a black-box reward signal, and this often results in inefficiencies and mismatches between the generated code and safety guarantees. The proposed method adds a differentiable verification layer that mimics formal verification steps with the help of smoothing surrogate functions that allows for gradient-based improvement of both code generation and safety specifications. This layer calculates soft satisfaction scores for safety properties which are then ushered in consensus with rewards completing the tasks in order to calculate the RL policy.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 25549
Loading