SAGE Can Quantify Why Two Models Behave Differently

SAGE Can Quantify Why Two Models Behave Differently

ICLR 2026 Conference Submission25385 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Explainable AI, Vision-based Driver Distraction Detection (vDDD), SAGE, Saliency Embeddings, Behavioral Divergence, Domain Shift, Generalization, Shortcut Learning, Vision--Language Models (VLMs)

Abstract: Vision-based activity recognition tasks are sensitive to environmental context and lighting, making generalization across domains difficult. Models trained in controlled settings can report high accuracy, but often fail under domain shift, where it remains unclear whether predictions depend on causal foreground cues, spurious background signals, or shortcut learning tied to context rather than behavior. Saliency methods offer a view of model focus, but have largely been confined to qualitative visualization. We hypothesize that behavioral divergence between models is proportional to divergence in their saliency embeddings. To examine this, we introduce Saliency Attribution for Goal-grounded Evaluation (SAGE), a modular framework that unifies heterogeneous datasets through category mapping and balancing, generates controlled foreground and background variants, computes saliency maps, and encodes them into tokenized representations suitable for embedding and comparison. By disentangling foreground and background saliency, the framework provides a diagnostic signal of how models attend to causal versus spurious regions, complementing accuracy as a measure of generalization. We demonstrate feasibility on vision-based driver distraction detection, an activity recognition task where distraction is inferred from driver activities rather than objects, by creating a unified 10-class variant of the StateFarm and 100-Driver datasets that highlights the challenges of category mapping and background control. While full embedding-based evaluations are ongoing, the framework separates foreground and background saliency, discretizes them into tokens, and encodes them in a manner aligned with tokenized vision architectures such as ViTs and VLMs. This design makes the framework scalable across vision-based classification tasks where foreground-background disentanglement is critical, and presents it as a diagnostic tool for analyzing behavioral divergence and robustness under domain shift.

Supplementary Material: zip

Primary Area: interpretability and explainable AI

Submission Number: 25385

Loading