Keywords: Knowledge Discovery, Interpretability
TL;DR: This paper extracts shape patterns that explain the QiGan encoded by the value network for the Go game.
Abstract: Given a deep neural network (DNN) that has surpassed human beings in a task, disentangling the explicit knowledge encoded by the DNN to gain new insights into the task is a promising yet challenging regime in explainable AI. In this paper, we aim to disentangle the "QiGan" encoded by the AI model for Go, which has beaten top human players. Specifically, we disentangle primitive shape patterns of stones memorized by the value network, and these shape patterns represent the "QiGan" used to conduct a fast situation assessment of the current board state. To this end, we propose to use both AND interactions and OR interactions to obtain a more concise explanation of shape patterns. We further prove the universal matching property of AND-OR interactions to ensure the faithfulness and verifiability of the explained shape patterns. In experiments, our method finds many novel shape patterns beyond the traditional shape patterns in human knowledge.
Supplementary Material: zip
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Submission Number: 22777
Loading