Potential-based reward shaping using state-space segmentation for efficiency in reinforcement learning

Melis Ilayda Bal; Hüseyin Aydin; Cem Iyigün; Faruk Polat

Potential-based reward shaping using state-space segmentation for efficiency in reinforcement learning

Melis Ilayda Bal, Hüseyin Aydin, Cem Iyigün, Faruk Polat

Published: 01 Jan 2024, Last Modified: 18 Sept 2024Future Gener. Comput. Syst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•An online and proper segmentation with Extended Segmented Q-Cut approach on state space of the given RL problem leads a decomposition of the task for the learning agent.•Applying reward shaping based on this segmentation compensate sparse rewards of the environment with shaped rewards as immediate feedback.•Having or being closer to the goal state for a segment naturally designates the potential of the state in the resulting policy.•Guiding the learning agent toward such a segment, without violating policy invariance property, facilitates the early convergence to an optimal solution.

Loading