TL;DR: Making explanations operational with Zero-Knowledge Proofs when models are kept confidential.
Abstract: In principle, explanations are intended as a way to increase trust in machine learning models and are often obligated by regulations. However, many circumstances where these are demanded are adversarial in nature, meaning the involved parties have misaligned interests and are incentivized to manipulate explanations for their purpose. As a result, explainability methods fail to be operational in such settings despite the demand. In this paper, we take a step towards operationalizing explanations in adversarial scenarios with Zero-Knowledge Proofs (ZKPs), a cryptographic primitive. Specifically we explore ZKP-amenable versions of the popular explainability algorithm LIME and evaluate their performance on Neural Networks and Random Forests. Our code is publicly available at : \url{https://github.com/emlaufer/ExpProof}.
Lay Summary: **ExpProof – Building Trust in AI Explanations while keeping Model Confidential**
In many real-world scenarios—such as loan applications or hiring decisions—organizations use machine learning (ML) models to make predictions. Regulations such as GDPR's Right to Explanation mandate that people affected by these decisions get an explanation for the ML decisions. However, this scenario involves misaligned incentives , in the sense that model developers are incentivized to give *incontestable* explanations rather than reveal the true workings of its model. Additionally, if the model is confidential and the explanation generation is opaque, how can we be sure the explanation is correct?
Our research introduces *ExpProof*, a system that lets organizations prove that their explanations are accurate—without revealing the model weights. We use a cryptographic technique called Zero-Knowledge Proofs (ZKPs), which allows someone to prove they correctly computed a value without revealing its private information.
We adapted a popular explanation method called LIME to work with ZKPs, ensuring that explanations are verifiable. We tested *ExpProof* on neural networks and decision trees, and found that it can generate proofs in under two minutes, with verification taking just fractions of a second.
*ExpProof* helps build trust in AI systems by ensuring that explanations are truthful while keeping the model confidential—an important step toward more transparent and accountable machine learning.
Link To Code: https://github.com/emlaufer/ExpProof
Primary Area: Social Aspects->Security
Keywords: Explanations, Zero-Knowledge Proofs
Submission Number: 2649
Loading