HYPRL: Reinforcement Learning of Control Policies for Hyperproperties

Tzu-Han Hsu; Arshia Rafieioskouei; Borzoo Bonakdarpour

HYPRL: Reinforcement Learning of Control Policies for Hyperproperties

Tzu-Han Hsu, Arshia Rafieioskouei, Borzoo Bonakdarpour

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement learning, formal methods, hyperproperties, temporal logics, specification-based RL, multi-agent RL, reward shaping, HyperLTL, MDP

TL;DR: A framework that synthesizes a tuple of optimal control policies for multi-agent systems that maximizes the probability of satisfying a desired hyperproperty.

Abstract: Reward shaping in multi-agent reinforcement learning (MARL) for complex tasks remains a significant challenge. Existing approaches often fail to find optimal solutions or cannot efficiently handle such tasks. We propose HYPRL, a specification-guided reinforcement learning framework that learns control policies w.r.t. hyperproperties expressed in HyperLTL. Hyperproperties constitute a powerful formalism for specifying objectives and constraints over sets of execution traces across agents. To learn policies that maximize the satisfaction of a HyperLTL formula $\varphi$, we apply Skolemization to manage quantifier alternations and define quantitative robustness functions to shape rewards over execution traces of a Markov decision process with unknown transitions. A suitable RL algorithm is then used to learn policies that collectively maximize the expected reward and, consequently, increase the probability of satisfying $\varphi$. We evaluate HYPRL on a diverse set of benchmarks, including safety-aware planning, Deep Sea Treasure, and the Post Correspondence Problem. We also compare with specification-driven baselines to demonstrate the effectiveness and efficiency of HYPRL.

Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)

Submission Number: 13476

Loading