Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Logic-based explanation, sequential models, Computational Complexity, RNN, Automata, Transformers
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: This work contributes to formal explainability in AI (FXAI) for sequential
models, including
Recurrent Neural Networks (RNN), Transformers, and automata models from
formal
language theory (e.g. finite-state automata). We study two common notions
of explainability in FXAI: (1) abductive explanations (a.k.a. minimum sufficient
reasons), and (2) counterfactual (a.k.a. contrastive) explanations.
To account for various forms of sequential data (e.g. texts, time series,
and videos), our models take a sequence of rational numbers as
input.
We first observe that
simple RNN and Transformers suffer from NP-hard complexity (or sometimes
undecidability) for both types of explanations. The works on extraction of
automata from RNN hinge on the assumption that automata are more interpretable
than RNN. Interestingly, it turns out that generating abductive explanations
for DFA is computationally intractable (PSPACE-complete), for features that
are represented by regular languages. On the positive side,
we show that deterministic finite automata (DFA) admit polynomial-time
complexity for counterfactual explanations.
However, DFA are a highly inexpressive model
for classifying sequences of numbers. To address this limitation,
we provide two expressive extensions of finite automata, while preserving
PTIME explainability and admitting automata learning algorithms: (1)
deterministic interval automata, and (2)
deterministic register automata with a fixed number of registers.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8310
Loading