We have developed a logic-based adaptive reward shaping approach for RL. Our approach uses reward functions designed to incentivize an agent to complete a task specified by a co-safe LTL formula as much as possible, and dynamically updates these reward functions during the learning process. This dynamic reward shaping is beneficial for scenarios where environmental uncertainties can lead to task failure despite successful subtask progress.

Computational experiments demonstrate that our approach is applicable to various discrete and continuous RL domains and is compatible with a wide range of RL algorithms such as DQN, DDQN, DDPG, PPO, and A2C. Experimental results also show that the proposed approach generally outperforms state-of-the-art baselines, achieving faster convergence to a better policy with higher expected return and task completion rate.

There are several directions for future work. First, we will evaluate the proposed approach on a broader range of RL domains beyond the benchmarks used in our experiments. Second, we will explore extending the approach to multi-agent RL. Finally, we aim to apply the proposed approach to real-world RL tasks, such as autonomous driving.

