Explaining AlphaGo: Interpreting Contextual Effects in Neural Networks

Zenan Ling; Haotian Ma; Yu Yang; Robert C. Qiu; Song-Chun Zhu; Quanshi Zhang

Explaining AlphaGo: Interpreting Contextual Effects in Neural Networks

Zenan Ling, Haotian Ma, Yu Yang, Robert C. Qiu, Song-Chun Zhu, Quanshi Zhang

27 Sept 2018 (modified: 26 May 2025)ICLR 2019 Conference Withdrawn SubmissionReaders: Everyone

Abstract: This paper presents two methods to disentangle and interpret contextual effects that are encoded in a pre-trained deep neural network. Unlike convolutional studies that visualize image appearances corresponding to the network output or a neural activation from a global perspective, our research aims to clarify how a certain input unit (dimension) collaborates with other units (dimensions) to constitute inference patterns of the neural network and thus contribute to the network output. The analysis of local contextual effects w.r.t. certain input units is of special values in real applications. In particular, we used our methods to explain the gaming strategy of the alphaGo Zero model in experiments, and our method successfully disentangled the rationale of each move during the game.

Keywords: Interpretability, Deep learning, alphaGo

TL;DR: This paper presents methods to disentangle and interpret contextual effects that are encoded in a deep neural network.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/explaining-alphago-interpreting-contextual/code)

8 Replies

Loading