DIOMIX: A Dynamic Multi-Agent Reinforcement Learning Mixing Structure for Independent Intra-Option Learning

TMLR Paper2891 Authors

19 Jun 2024 (modified: 17 Sept 2024)Withdrawn by AuthorsEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In cooperative multi-agent reinforcement learning (MARL), agents are equipped with a formalism to plan, learn, and reason in diverse ways, enabling continual knowledge accumulation over time. Each agent must consistently learn within its environment and possess the ability to reason at various levels of both temporal and spatial abstraction to navigate the intricacies specific to its surroundings. Current state-of-the-art approaches explicitly rely on learning an objective function that harmonizes both planning and learning without explicitly relying on reasoning. We propose a distinctive framework, Dynamic Intra-Options Mixtures (DIOMIX), aiming to address the deficiency in reasoning capabilities present in current state-of-the-art algorithms. We introduce an agent-independent option-based framework, incorporating a notion of temporal abstraction into the MARL paradigm using an advantage-based learning scheme directly on the option policy. This scheme enables higher long-term utility retention compared to directly optimizing action-value functions themselves. However, using temporal difference learning could hinder the optimization of extended temporal actions; therefore, to mitigate this issue where options are optimized solely to execute as primitive actions, we incorporate a regularization mechanism into the learning process to enable options execution over extended periods. Through quantitative and qualitative empirical results, DIOMIX can acquire individually separable and explainable reasoning capabilities that lead to agent specialization, task simplification, and help with training efficiency. We achieve this by embedding their learning within an option-based framework without compromising performance.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Adam_M_White1
Submission Number: 2891
Loading