Processing, Priming, Probing: Human Interventions for Explainability Alignment

Kenza Amara

Processing, Priming, Probing: Human Interventions for Explainability Alignment

Kenza Amara

Published: 06 Mar 2025, Last Modified: 05 May 2025ICLR 2025 Bi-Align Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Alignment; Explainability; Human-centric; Human-centered Explainability; Intervention

TL;DR: We rigorously define the concept of explainability alignment and propose the Processing, Priming, and Probing framework, a classification of human interventions to align model-centric explanations

Abstract: As artificial intelligence (AI) systems play a central role in decision-making, the need for explainability becomes more critical. Effective explanations must balance two key objectives: faithfully representing the model’s behavior while remaining reasonable and useful to humans. This dual requirement makes alignment a fundamental challenge in explainable AI (XAI). Research in human-centered XAI (HCXAI) has introduced guidelines and evaluation methods to enhance the accessibility and usability of explanations. These efforts have led to concrete strategies for incorporating human prior knowledge in the explainability pipeline. However, prioritizing human-centricity often comes at the cost of accurately reflecting the model’s reasoning, behavior, and internal functioning. In this paper, we rigorously define *explainability alignment*, ensuring explanations remain both model- and human-centric without sacrificing one for the other. To maintain this balance, we propose targeted human interventions that enhance interpretability while preserving the core objective of XAI: making black-box models more transparent. To structure these interventions, we present *the Processing, Priming, and Probing (PPP) framework*, which categorizes different intervention strategies for achieving explainability alignment. They encompass (1) modifications to final explanations, (2) prior adjustments within a fixed XAI pipeline, and (3) novel approaches to designing and refining explanations with human supervision. Equipping researchers with such a framework will facilitate the development of more aligned explainability methods.

Submission Type: Long Paper (9 Pages)

Archival Option: This is an archival submission

Presentation Venue Preference: ICLR 2025

Submission Number: 55

Loading