CAREL: Instruction-guided Reinforcement Learning with Cross-modal Auxiliary Objectives

Armin Saghafian; Amirmohammad Izadi; Negin Hashemi Dijujin; Mahdieh Soleymani Baghshah

CAREL: Instruction-guided Reinforcement Learning with Cross-modal Auxiliary Objectives

Armin Saghafian, Amirmohammad Izadi, Negin Hashemi Dijujin, Mahdieh Soleymani Baghshah

07 Oct 2023 (modified: 21 Oct 2023)Submitted to LangRob @ CoRL 2023EveryoneRevisionsBibTeX

Keywords: Reinforcement Learning, Instruction-following, Language-informed RL, Contrastive Learning, Grounding

TL;DR: This work addresses the grounding problem in instruction-following RL agents using multi-grained and multi-modal auxiliary loss functions.

Abstract: Grounding the instruction in the environment is a key step in solving language-guided goal-reaching reinforcement learning problems. In reinforcement learning, the primary aim is to maximize cumulative rewards, which frequently have sparse values in goal-conditioned settings. However, in goal-reaching scenarios, the agent must comprehend the different parts of the instructions within the environmental context in order to complete the overall task successfully. In this work, we propose **CAREL** (***C**ross-modal **A**uxiliary **RE**inforcement **L**earning*) as a new framework to solve this problem using auxiliary loss functions inspired by video-text retrieval literature. The results of our experiments suggest superior sample efficiency and generalization for this framework in different multi-modal reinforcement learning problems.

Submission Number: 27

Loading