Governed Self-Improvement for Logical Reasoning: Edit-Time Governance for Developmental Consistency

Published: 01 Apr 2026, Last Modified: 17 Apr 2026ICLR 2026 Workshop LLM ReasoningEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 10 pages)
Keywords: logical reasoning, large language models, consistency across related questions, governed self improvement, symbolic reasoning, contradiction detection, solver guided validation, temporal consistency, developmental consistency evaluation, reasoning evaluation benchmarks, edit time validation, model self editing, neuro symbolic reasoning, logical consistency metrics, recursive self-improvement, propositional logic reasoning, regression prevention, reasoning trajectories, model governance, reasoning robustness, reasoning stability
TL;DR: Framework for improving LLM reasoning by governing model edits with solver-based validation and cross-question consistency constraints, showing reduced contradictions and stable reasoning trajectories in a controlled logical domain.
Abstract: Self-refinement methods enable large language models to improve without retraining, yet they optimize local answers rather than the future reasoner. In logical reasoning, every answer creates longitudinal commitments: paraphrases, negations, implication chains, and premise permutations must remain jointly consistent across developmental time. We present a governance-oriented framework and evaluation lens with proof-of-concept validation on a controlled propositional-logic domain. (1) We frame self-improvement as a commitment-management problem and show that uncontrolled search can increase contradictions even while raising accuracy. (2) We propose GSI-LR (Governed Self-Improvement for Logical Reasoning), a framework combining branch-diverse proposal search, a temporal contradiction graph (TCG) grounded in AGM-style belief revision, an axiomatic validation cascade using symbolic solvers at edit time, and an explicit edit-rights policy. (3) We introduce Developmental Consistency Evaluation (DCE), a protocol measuring family contradiction rate (FCR; lower is better — fewer family contradictions), acceptance precision, delayed regression, rollback burden, and maintenance debt over trajectories rather than snapshots. (4) We validate GSI-LR on a Z3-grounded propositional-logic domain (200 questions, 40 families, 50 edit rounds, 5 seeds), demonstrating that governed development occupies a favorable position on the accuracy–consistency Pareto frontier: it reduces FCR by 8.8% relative to static baselines (FCR 0.675 vs. 0.740, lower is better) while maintaining strict non-regression, whereas unconstrained search achieves perfect accuracy at the cost of increased contradictions (FCR 0.775).
Presenter: ~David_Scott_Lewis1
Format: Yes, the presenting author will definitely attend in person because they attending ICLR for other complementary reasons.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 211
Loading