Self-Attention Guided Advice Distillation in Multi-Agent Deep Reinforcement Learning

Yang Li; Sihan Zhou; Yaqing Hou; Liran Zhou; Hongwei Ge; Liang Feng; Siyu Wang

Self-Attention Guided Advice Distillation in Multi-Agent Deep Reinforcement Learning

Yang Li, Sihan Zhou, Yaqing Hou, Liran Zhou, Hongwei Ge, Liang Feng, Siyu Wang

Published: 01 Jan 2024, Last Modified: 13 Nov 2024IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Advising is an effective method to enhance agent learning performance in multi-agent deep reinforcement learning. Existing advising methods typically rely on a teacher-student framework where a teacher agent provides student agents with action or Q-value advice. However, they share a common limitation: the advice from a teacher agent can only assist a student in making a one-time decision in the current state and cannot be internalized into the student agent’s knowledge to intrinsically change the student agent’s decision model. Consequently, the advice acts more like a one-time instruction from the teacher rather than a learning aid. If the student agent encounters the same problem again, it may still be unable to make a sound decision and need to request advice. This not only fails to rapidly enhance the agent’s decision-making ability fundamentally but also leads to a considerable waste of communication costs. Hence, we propose a multi-agent advice distillation framework through attention that allows the student agent to request advice from the experienced teacher and distill that advice into their own decision model via the self-attention mechanism. As a result, advice is fully utilized, allowing for a rapid and intrinsic improvement in the agent’s decision-making capabilities. Our empirical evaluations demonstrate that, compared to existing advising methods, our method significantly improves learning performance while reducing the communication cost.

Loading