Improved Policy Extraction via Online Q-Value Distillation

Aman Jhunjhunwala, Jaeyoung Lee, Sean Sedwards, Vahdat Abdelzad, Krzysztof Czarnecki

2020 (modified: 02 Nov 2022)IJCNN 2020Readers: Everyone

Abstract: Deep neural networks are capable of solving complex control tasks in challenging environments, but their learned policies are hard to interpret. Not being able to explain or verify them limits their practical applicability. By contrast, decision trees lend themselves well to explanation and verification, but are not easy to train, especially in an online fashion. In this work we introduce Q-BSP trees and propose an Ordered Sequential Monte Carlo training algorithm that efficiently distills the Q-function from fully trained deep Q-networks into a tree structure. Q-BSP forests are used to generate the partitioning rules that transparently reconstruct an accurate value function. We explain our approach and provide results that convincingly beat earlier online policy distillation methods with respect to their own performance benchmarks.

0 Replies