Abstract: This paper presents two methods to disentangle and interpret contextual effects that are encoded in a pre-trained deep neural network. Unlike convolutional studies that visualize image appearances corresponding to the network output or a neural activation from a global perspective, our research aims to clarify how a certain input unit (dimension) collaborates with other units (dimensions) to constitute inference patterns of the neural network and thus contribute to the network output. The analysis of local contextual effects w.r.t. certain input units is of special values in real applications. In particular, we used our methods to explain the gaming strategy of the alphaGo Zero model in experiments, and our method successfully disentangled the rationale of each move during the game.
Keywords: Interpretability, Deep learning, alphaGo
TL;DR: This paper presents methods to disentangle and interpret contextual effects that are encoded in a deep neural network.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/explaining-alphago-interpreting-contextual/code)
8 Replies
Loading