Tackling Visual Control via Multi-View Exploration Maximization

Tackling Visual Control via Multi-View Exploration Maximization

TMLR Paper642 Authors

28 Nov 2022 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We present MEM: Multi-view Exploration Maximization for tackling complex visual control tasks. To the best of our knowledge, MEM is the first approach that combines multi-view representation learning and intrinsic reward-driven exploration in reinforcement learning (RL). More specifically, MEM first extracts the specific and shared information of multi-view observations to form high-quality features before performing RL on the learned features, enabling the agent to fully comprehend the environment and yield better actions. Furthermore, MEM transforms the multi-view features into intrinsic rewards based on entropy maximization to encourage exploration. As a result, MEM can significantly promote the sample-efficiency and generalization ability of the RL agent, facilitating solving real-world problems with high-dimensional observations and spare-reward space. We evaluate MEM on various tasks from DeepMind Control Suite and Procgen games. Extensive simulation results demonstrate that MEM can achieve superior performance and outperform the benchmarking schemes with simple architecture and higher efficiency.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: 1. We redesigned the experiments and now we have results of six parts based on DrQv2: - DrQv2 (singe-view obs) - DrQv2 (multi-view obs) - DrQv2 (singe-view obs + RE3) - DrQv2 (multi-view obs + multi-view representation learning) - DrQv2 (multi-view obs + RE3) - DrQv2+MEM Moreover, we included more random seeds and the figures are redrawn. Now the performance comparison is more explicit and reliable. Our MEM produces significant performance gain as compared to benchmarks in various tasks **(See Figure 3, 4, 6, 7, 8)**. 2. Add two more manipulation tasks of DMC suite, namely Reach Site and Place Brick **(Figure 5, 6)**. 3. Performed experiments on the full Procgen benchmark **(See Table 2 and Table 3)**. 4. Add computation efficiency comparison **(See Appendix A.3)**. 5. Removed the **old Figure 5, Figure 6 and Table 2**.

Assigned Action Editor: ~Matthieu_Geist1

Submission Number: 642

Loading