Follow the STARs: Dynamic $\omega$-Regular Shielding of Learned Probabilistic Policies
Keywords: Shielding, $\omega$-regular Games, Strategy Templates, Reinforcement Learning, Run-time Monitoring
Abstract: This paper presents a novel dynamic post-shielding framework that enforces the full class
of $\omega$-regular correctness properties over learned probabilistic policies. This
constitutes a paradigm shift from the predominant setting of safety-shielding -- i.e.,
ensuring that nothing bad ever happens -- to a shielding process that additionally enforces
liveness -- i.e., ensures that something good eventually happens. At the core, our method
uses \emph{Strategy-Template-based Adaptive Runtime Shields (STARs)}, which leverage
permissive strategy templates to enable post-shielding with minimal interference. As its
main feature, STARs introduce a mechanism to \emph{dynamically control interference},
allowing a tunable enforcement parameter to balance formal obligations and task-specific
behavior \emph{at runtime}. This allows triggering more aggressive enforcement when needed
while allowing for optimized policy choices otherwise.
In addition, STARs support runtime adaptation to changing specifications or actuator failures, making them especially suited for cyber-physical applications. We evaluate STARs on various benchmarks to showcase their scalability, adaptability and performance.
Area: Game Theory and Economic Paradigms (GTEP)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 1716
Loading