TL;DR: We propose RECAST, a model reconstruction framework through counterfactual-aware wasserstein optimiztation.
Abstract: Counterfactual explanations (CFs) help understand machine learning models by identifying minimal input changes that would lead to alternative model outcomes. Recent work demonstrates their utility for reconstructing black-box models, enabling third-party auditing of opaque decision systems for fairness and accountability. Still, CF-based reconstruction may suffer from decision boundary shifts, overfitting, and restrictive assumptions requiring online query access to target platforms.
We propose **REconstruction via Counterfactual-Aware waSserstein opTimization (RECAST)** under limited data and restricted access, a behavioral surrogate model based on Wasserstein barycenteric prototypes. Our approach addresses decision boundary shifts by incorporating CFs as informative, though less representative, samples for both classes, maintaining high surrogate fidelity in low-sample regimes without requiring online access during reconstruction. To enhance fairness auditing, our method enables systematic group fairness diagnostics.
Experiments on real-world datasets and various setups show that **RECAST** effectively achieves high fidelity and query efficiency, as well as stable results even when the access is limited and noisy.
Lay Summary: AI systems often make decisions, like approving loans or flagging content, without explaining how they made or what influenced their decision. This can make it diffidult to check whether they are fair or biased and understand the decision behavior.
One way to look inside these systems is to ask *What would need to be different about this input for the AI to make a different decision?*. The answers to this question can be computed and is called a Counterfactual Explanation, they provide "what-if" alternatives to a provided input.
Researches have been using these what-if questions to make a reliable copy of an AI model. However, this approach has problems: the copy can be inaccurate, it tends to work poorly with limited data, and it usually requires repeated live access to the original system.
We propose a new method called **RECAST**, that builds a more reliable copy of the model even when you have limited data and restricted access to the original system. It uses a mathematical technique to handle the counterfactual explanations, treating them as useful but imperfect clues about the behavior of the original model.
This will help more accurately mimic a black-box AI system and detect potential fairness issues, such as whether the model treats different demographic groups unequally.
Link To Code: https://github.com/zhaoxuan00707/ce_reconstruction
Primary Area: Social Aspects->Accountability, Transparency, and Interpretability
Keywords: Model reconstruction, Accountability, Counterfactual Explanations, Fairness
Originally Submitted PDF: pdf
Submission Number: 11893
Loading