Learning Reward Structure with Subtasks in Reinforcement Learning

Shuai Han; Mehdi Dastani; Shihan Wang

Learning Reward Structure with Subtasks in Reinforcement Learning

Shuai Han, Mehdi Dastani, Shihan Wang

Published: 17 Jul 2025, Last Modified: 07 Oct 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement learning, sparse-reward environments, sample efficiency

Abstract: Improving sample efficiency of Reinforcement Learning (RL) in sparse-reward environments poses a significant challenge. In scenarios where the reward structure is complex, accurate action evaluation often relies heavily on precise information about past achieved subtasks and their order. Previous approaches have often failed or proved inefficient in constructing and leveraging such intricate reward structures. In this work, we propose an RL algorithm that can automatically structure the reward function for sample efficiency, given a set of labels that signify subtasks. Given such minimal knowledge about the task, we train a high-level policy that selects optimal subtasks in each state together with a low-level policy that efficiently learns to complete each sub-task. We evaluate our algorithm in a variety of sparse-reward environments. The experiment results show that our method significantly outperforms the state-of-art baselines as the difficulty of the task increases.

Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.

Serve As Reviewer: ~Shuai_Han2

Track: Fast Track: published work

Publication Link: https://ebooks.iospress.nl/doi/10.3233/FAIA240751

Submission Number: 38

Loading