Leveraging Mutual Information for Asymmetric Learning under Partial Observability

Hai Huu Nguyen; Long Dinh Van The; Christopher Amato; Robert Platt

Leveraging Mutual Information for Asymmetric Learning under Partial Observability

Hai Huu Nguyen, Long Dinh Van The, Christopher Amato, Robert Platt

Published: 05 Sept 2024, Last Modified: 19 Oct 2024CoRL 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Partial Observability, Mutual Information, Reinforcement Learning

TL;DR: This paper proposes to use the mutual information between the history and the state to improve learning under partial observability, assuming state availability during training.

Abstract: Even though partial observability is prevalent in robotics, most reinforcement learning studies avoid it due to the difficulty of learning a policy that can efficiently memorize past events and seek information. Fortunately, in many cases, learning can be done in an asymmetric setting where states are available during training but not during execution. Prior studies often leverage the state to indirectly influence the training of a history-based actor (actor-critic methods) or a history-based critic (value-based methods). Instead, we propose using state-observation and state-history mutual information to improve the agent's architecture and ability to seek information and memorize efficiently through intrinsic rewards and an auxiliary task. Our method outperforms strong baselines through extensive experiments and achieves successful sim-to-real transfers to a real robot.

Supplementary Material: zip

Spotlight Video: mp4

Video: https://sites.google.com/view/mi-asym-pomdp

Website: https://sites.google.com/view/mi-asym-pomdp

Code: https://sites.google.com/view/mi-asym-pomdp

Publication Agreement: pdf

Student Paper: yes

Submission Number: 256

Loading