Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Goal-Conditioning, Deep Reinforcement Learning, State Space Search
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Incorporated bidirectional (from start to goal, and goal to start state) RL with goal conditioning to ensure one policy function to solve multiple tasks.
Abstract: State space search problems have a binary (found/not found) reward system. However,
in the real world, these problems often have a vast number of states compared
to only a limited number of goal states. This makes the rewards very sparse for
the search task. On the other hand, Goal-Conditioned Reinforcement Learning
(GCRL) can be used to train an agent to solve multiple related tasks. In our work,
we assume the ability to sample goal states and use the same to define a forward
task (τ ∗) and a reverse task (τ inv) derived from the original state space search
task to ensure more useful and learnable samples. We adopt the Universal Value
Function Approximator (UVFA) setting with a GCRL agent to learn from these
samples. We incorporate hindsight relabelling with goal-conditioning in the forward
task to reach goals sampled from τ ∗, and similarly define ‘Foresight’ for
the backward task. We also use the agent’s ability (from the policy function) to
reach intermediate states and use these states as goals for new sub-tasks. Further,
to tackle the problem of reverse transitions from the backward trajectories,
we spawn new instances of the agent from states in these trajectories to collect
forward transitions which are then used to train for the main task τ ∗. We consolidate
these tasks and sample generation strategies into a three-part system called
Scrambler-Resolver-Explorer (SRE). We also propose the ‘SRE-DQN’ agent that
combines our exploration module with the popular DQN algorithm. Finally, we
demonstrate the advantages of bi-directional goal-conditioning and knowledge of
the goal state by evaluating our framework on classical goal-reaching tasks, and
comparing with existing concepts extended to our bi-directional setting.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6710
Loading