Extracting PAC Decision Trees from Black Box Binary Classifiers (Extended Abstract)

Ana Ozaki; Roberto Confalonieri; Ricardo Guimarães; Anders Imenes

Extracting PAC Decision Trees from Black Box Binary Classifiers (Extended Abstract)

Ana Ozaki, Roberto Confalonieri, Ricardo Guimarães, Anders Imenes

Published: 29 Aug 2025, Last Modified: 29 Aug 2025NeSy 2025 - Phase 2 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Knowledge extraction, PAC learning, Explainable AI

TL;DR: n this work, we investigate the use of the Probably Approximately Correct (PAC) framework to provide a theoretical guarantee of fidelity for decision trees extracted from AI models

Abstract: Decision trees are a popular machine learning method, valued for their inherent explainability. In Explainable AI, decision trees serve as surrogate models for complex black box AI models or as approximations of parts of such models. A key challenge of this approach is assessing how accurately the extracted decision tree represents the original model and determining the extent to which it can be trusted as an approximation of its behavior. In this work, we investigate the use of the Probably Approximately Correct (PAC) framework to provide a theoretical guarantee of fidelity for decision trees extracted from AI models. Leveraging the theoretical foundations of the PAC framework, we adapt a decision tree algorithm to ensure a PAC guarantee under specific conditions. We focus on binary classification and conduct experiments where we extract decision trees from BERT-based language models with PAC guarantees. Our results indicate occupational gender bias in these models, which confirm previous results in the literature. Additionally, the decision tree format enhances the visualization of which occupations are most impacted by social bias.

Track: Neurosymbolic Methods for Trustworthy and Interpretable AI

Paper Type: Extended Abstract

Resubmission: No

Publication Agreement: pdf

Submission Number: 31

Loading