Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in First-person Simulated 3D Environments

Wilka Torrico Carvalho; Anthony Liang; Kimin Lee; Sungryull Sohn; Honglak Lee; Richard Lewis; Satinder Singh

Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in First-person Simulated 3D Environments

Wilka Torrico Carvalho, Anthony Liang, Kimin Lee, Sungryull Sohn, Honglak Lee, Richard Lewis, Satinder Singh

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: object-centric, representation learning, reinforcement learning, sparse reward

Abstract: First-person object-interaction tasks in high-fidelity, 3D, simulated environments such as the AI2Thor virtual home-environment pose significant sample-efficiency challenges for reinforcement learning (RL) agents learning from sparse task rewards. To alleviate these challenges, prior work has provided extensive supervision via a combination of reward-shaping, ground-truth object-information, and expert demonstrations. In this work, we show that one can learn object-interaction tasks from scratch without supervision by learning an attentive object-model as an auxiliary task during task learning with an object-centric relational RL agent. Our key insight is that learning an object-model that incorporates object-relationships into forward prediction provides a dense learning signal for unsupervised representation learning of both objects and their relationships. This, in turn, enables faster policy learning for an object-centric relational RL agent. We demonstrate our agent by introducing a set of challenging object-interaction tasks in the AI2Thor environment where learning with our attentive object-model is key to strong performance. Specifically, by comparing our agent and relational RL agents with alternative auxiliary tasks with a relational RL agent equipped with ground-truth object-information, we find that learning with our object-model best closes the performance gap in terms of both learning speed and maximum success rate. Additionally, we find that incorporating object-relationships into an object-model's forward predictions is key to learning representations that capture object-category and object-state.

One-sentence Summary: We develop a reinforcement learning agent to improve sample-efficiency for sparse-reward object-interaction tasks in high-fidelity, 3D simulated environments.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/reinforcement-learning-for-sparse-reward/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=hI3yLTI6L_

12 Replies

Loading