Dynamic Correction of Erroneous Initial Policies via Diffusion-Driven Bayesian Exploration

06 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Social Impact, Policy Error Correction
Abstract: In emergency response and other high-stakes societal applications, early-stage policies critically shape downstream outcomes. Yet, these initial policies—often based on limited or biased information—can severely misalign with reality, constraining subsequent actions and potentially causing catastrophic delays, resource misallocation, and human harm. Under the stationary bootstrap baseline (zero transition and no rejuvenation), bootstrap particle filters exhibit Stationarity-Induced Posterior Support Invariance (S-PSI), wherein regions excluded by the initial prior remain permanently unexplorable, making corrections impossible even when new evidence contradicts current beliefs. While classical perturbations can in principle break this lock-in, they operate in an always-on fashion and may be inefficient. To overcome this, we propose a diffusion-driven Bayesian exploration framework that enables principled, real-time correction of early policy errors. Our method expands posterior support via entropy-regularized sampling and covariance-scaled diffusion. A Metropolis–Hastings check validates proposals and keeps inference adaptive to unexpected evidence. Empirical evaluations on realistic hazardous-gas localization tasks show that our approach matches reinforcement learning and planning baselines when priors are correct. It substantially outperforms classical SMC perturbations and RL-based methods under misalignment, and we provide theoretical guarantees that DEPF resolves S-PSI while maintaining statistical rigor.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 2565
Loading