Towards Tractable Formal Explanations for Neural Network Models

Qianru Zhou

Towards Tractable Formal Explanations for Neural Network Models

Qianru Zhou

06 Feb 2024 (modified: 07 Feb 2024)AAAI 2024 Workshop ASEA SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Explanable Artificial Intelligence, formal explanable AI, Prime Implicant Explanations, neural network, cybersecurity, intrusion detection

TL;DR: A novel methodology for formal explanable AI methodology for neural network models.

Abstract: Neural networks are fragile, it is widely reported that trivial disturbances on the inputs can make a neural network model flip its decision, and as a black box model, it cannot provide any explanations. Many of the current neural network explanation methodologies are heuristic, cannot guarantee their accuracy. Thus, formal explanation methodologies for neural network models are imperative, for machine learning models to be trustworthy and widely applied in real-life applications. Prime implicant (PI) explanations (also known as abductive explanations) are able to provide logical sufficient explanations for a neural network that are guaranteed to be correct. However, formally extracting PI explanations from a neural network model is NP-hard. In this paper, by converting the neural network model as a game, we propose an algorithm to extract formal explanations out of the neural network model with linear complexity. We apply the algorithms on real-life zero-day intrusion detection use case, and demonstrate the formal explanations extracted. Valuable insights and conclusions are discussed.

Submission Number: 6

Loading