A Timer-Enforced Hybrid Supervisor for Robust, Chatter-Free Policy Switching

Published: 09 May 2025, Last Modified: 09 May 2025RLC 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Chattering, robustness, value function-based switching, hybrid control
Abstract: We address the challenge of switching among multiple learned policies in reinforcement learning control systems, where conventional value function–based methods can lead to chattering in the presence of small measurement noise. Our goal is to design a switching logic that preserves the asymptotic stability of each individual policy, achieves overall performance that is at least as good as any fixed policy, and maintains a robustness margin so that rapid switching is prevented for any bounded measurement noise. To this end, we propose a timer-based hybrid supervisor that integrates a resettable timer that enforces a minimum dwell time on the active policy. This dwell time is adaptively adjusted by predicting the evolution of the state of the system, ensuring that a switch occurs only when a significantly better alternative is predicted. We derive sufficient conditions under which the hybrid supervisor is guaranteed to exhibit non-Zeno behavior and render a compact set robustly globally asymptotically stable in the presence of bounded measurement noise. Simulation results on representative decision-making problems demonstrate that our hybrid supervisors maintain performance and robustness under noisy conditions where a conventional switching strategy fails.
Supplementary Material: zip
Submission Number: 301
Loading