Do it for HER: First-order Logic Reward Specification in Reinforcement Learning

Published: 21 Jun 2025, Last Modified: 25 Jul 2025RLC 2025 Workshop PRLEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reward specification, first-order temporal logic
Abstract: In this work, we propose a novel framework for the logical specification of non-Markovian rewards in Markov Decision Processes (MDPs) with large state spaces. Our approach leverages Linear Temporal Logic Modulo Theories over finite traces (LTLf$^{MT}$), a more expressive extension of classical temporal logic in which predicates are first-order formulas of arbitrary first-order theories rather than simple Boolean variables. This enhanced expressiveness enables the specification of complex tasks over unstructured and heterogeneous data domains, promoting a unified and reusable framework that eliminates the need for manual predicate encoding. However, the increased expressive power of LTLf$^{MT}$ introduces additional theoretical and computational challenges compared to standard LTLf specifications. We address these challenges from a theoretical standpoint, identifying a fragment of LTLf$^{MT}$ that is tractable but sufficiently expressive for reward specification in an infinite-state-space context. From a practical perspective, we introduce a method based on reward machines and Hindsight Experience Replay (HER) to translate first-order logic specifications and address reward sparsity. We evaluate this approach to a continuous-control setting using Non-Linear Arithmetic Theory, showing that it enables natural specification of complex tasks. Experimental results show how a tailored implementation of HER is fundamental in solving tasks with complex goals.
Format: We have read the camera-ready instructions, and our paper is formatted with the provided template.
De-Anonymization: This submission has been de-anonymized.
Presenter: ~Matteo_Papini1
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 3
Loading