Abstract: In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information. For example, a person walking along a city street who tries to model all aspects of the world would quickly be overwhelmed by a multitude of shops, cars, and people moving in and out of view, each following their own complex and inscrutable dynamics. Is it possible to turn the agent's firehose of sensory information into a minimal latent state that is both necessary and sufficient for an agent to successfully act in the world? We formulate this question concretely, and propose the Agent Control-Endogenous State Discovery algorithm (AC-State), which has theoretical guarantees and is practically demonstrated to discover the minimal control-endogenous latent state which contains all of the information necessary for controlling the agent, while fully discarding all irrelevant information. This algorithm consists of a multi-step inverse model (predicting actions from distant observations) with an information bottleneck. AC-State enables localization, exploration, and navigation without reward or demonstrations. We demonstrate the discovery of the control-endogenous latent state in three domains: localizing a robot arm with distractions (e.g., changing lighting conditions and background), exploring a maze alongside other agents, and navigating in the Matterport house simulator.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: (1) Completely new main Figure 1 explaining method (2) New section 5.1 explaining motivation behind the theory and new Figure 5 giving intuition for the theory (3) New related work paragraph explicitly on causal learning (4) Additional experimental baselines and ablations in appendix (5) The numbering issue with theorems/definitions has been fixed (6) Related work has been moved earlier, as suggested by a reviewer. ---- Updates for camera ready: (7) Added new method and experimental results to the main paper, including the algorithm description box in the main paper. Additionally hyperparameters, especially the horizon, are explained in more detail. (8) Figures moved to fit into section corresponding to where experiment is discussed (Figures 4,5,6,7). (9) We added some new results in the Appendix E.1 showing that AC-State can capture full information about a block which can be pushed or pulled by the agent, and we have included example plans in the supplement material (pull_examples.txt and push_examples.txt).
Assigned Action Editor: ~Josh_Merel1
Submission Number: 644