Towards the first principles of explaining DNNs: interactions explain the learning dynamics

Huilin Zhou, Qihan Ren, Junpeng Zhang, Quanshi Zhang

Published: 01 Jan 2025, Last Modified: 06 Nov 2025Frontiers Inf. Technol. Electron. Eng. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Most explanation methods are designed in an empirical manner, so exploring whether there exists a first-principles explanation of a deep neural network (DNN) becomes the next core scientific problem in explainable artificial intelligence (XAI). Although it is still an open problem, in this paper, we discuss whether the interaction-based explanation can serve as the first-principles explanation of a DNN. The strong explanatory power of interaction theory comes from the following aspects: (1) it establishes a new axiomatic system to quantify the decision-making logic of a DNN into a set of symbolic interaction concepts; (2) it simultaneously explains various deep learning phenomena, such as generalization power, adversarial sensitivity, representation bottleneck, and learning dynamics; (3) it provides mathematical tools that uniformly explain the mechanisms of various empirical attribution methods and empirical adversarial-transferability-boosting methods; (4) it explains the extremely complex learning dynamics of a DNN by analyzing the two-phase dynamics of interaction complexity, which further reveals the internal mechanism of why and how the generalization power/adversarial sensitivity of a DNN changes during the learning process.

External IDs:dblp:journals/jzusc/ZhouRZZ25