% !TeX root = ..\freeExp.tex
\section{Conclusion}
This paper introduced a multi-agent multi-armed bandit problem with heterogeneous rewards among agents. The heterogeneous scenario creates a unique opportunity to explore a subset of arms for free and share the observation by cooperation, and hence, improve the aggregate regret significantly. We proposed a cooperative learning algorithm which would benefit from the free exploration and its regret is tight up to a constant factor. As a notable special case, when each arm is a local optimal arm in at least one agent, the proposed algorithm achieves an $O(1)$ regret.

This problem of multi-agent bandits with heterogeneous reward calls for several interesting follow-up questions, i.e., an interesting question is to extend the \FreeExp algorithm with an effective communication protocol. In a distributed multi-agent setting, cooperation may come with a cost of communication, and hence the goal is to enhance the cooperative algorithms with a communication policies that only needs sublinear communication times w.r.t. decision rounds \(T\), while directly extend current algorithm requires \(O(T)\) communication times.