Abstract: Distributional reinforcement learning has gained significant attention in recent years due to its ability to handle uncertainty and variability in the returns an agent will receive for each action it takes. A key challenge in distributional reinforcement learning is finding a measure of the difference between two distributions that is well-suited for use with the distributional Bellman operator, a function that takes in a value distribution and produces a modified distribution based on the agent's current state and action. In this paper, we address this challenge by introducing the multiquadric kernel to moment-matching distributional reinforcement learning. We show that this kernel is both theoretically sound and empirically effective. Our contribution is mainly of a theoretical nature, presenting the first formally sound kernel for moment-matching distributional reinforcement learning with good practical performance. We also provide insights into why the RBF kernel has been shown to provide good practical results despite its theoretical problems. Finally, we evaluate the performance of our kernel on a number of standard benchmarks, obtaining results comparable to the state-of-the-art.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Updates:
* Introduction of MQ kernel is moved to Section 2 so that it will be introduced before any discussion of it takes place.
* Section 3 is moved to after the theoretical work section.
* Added a short explanation of what to expect with the Andersson-Darling test
* “Concisely formulated” part in Section 4 is moved to earlier in the section so that the reader can get an overview of all the properties before they are discussed.
* The properties are now reordered to follow the order discussed in Section 4 (now Section 3).
* “$\mathrm{mmd}(\cdot, \cdot; k_h)$ is smooth” is changed to “$\mathrm{mmd}_b(\cdot, \cdot; k_h)$ is smooth”, since we are discussing the properties of the estimator. Property 2 (now property 4) is changed to reflect that we are interested in the properties of the loss function. Some qualifying remarks have been added.
* Proofs for properties 1 and 2 (now properties 3 and 4) are added to the appendix.
* In definition 4.1 (now 3.1), a mistake where $\psi$ was written instead of $f$ has been rectified.
* Definition 4.2 (now 3.2) has been changed for clarity.
* Added axis to the graph.
* Added QR-DQN results to the graph.
* Corr 4.1 is moved to the discussion around the limitations of RBF.
* More discussion around the Corr 4.1 is added.
* Aggregate scores (IQM) across all 8 games are added.
* Removed some inaccurate language around the introduction of kernels.
* Proof for Corr 4.1 has been made easier to follow. An inaccuracy in the presentation of the proof has also been rectified.
* Certain references were mission information. This has been fixed.
* General changes in language and grammar.
Updates 1. August.
* Added experiment that tests the robustness of MQ vs RBF with respect to parameter $h$ as suggested by Reviewer KxkT. (Figure 4. page 10)
Updates 18. August.
* Uploaded camera-ready version
Code: https://github.com/ludvigk/MQ-MMDRL
Supplementary Material: pdf
Assigned Action Editor: ~Amir-massoud_Farahmand1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1120
Loading