T-FIX: Text-Based Explanations with Features Interpretable to eXperts

Published: 23 Sept 2025, Last Modified: 07 Dec 2025FoRLM 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: interpretability, explainability, evaluation, domain-grounding, domain applications
Abstract: As LLMs are deployed in knowledge‑intensive settings (e.g., surgery, astronomy, therapy), users expect not just answers, but also meaningful explanations for those answers. In these settings, users are often domain experts (e.g., doctors, astrophysicists, psychologists) who require confidence that a model’s explanation reflects expert-level reasoning. However, current evaluation schemes primarily emphasize plausibility or internal faithfulness of the explanation, often neglecting whether the content of the explanation truly aligns with expert intuition. We formalize \textit{expert alignment} as a criterion for evaluating explanations with T‑FIX, a benchmark spanning seven knowledge-intensive domains. T-FIX includes datasets and novel alignment metrics developed in collaboration with domain experts, so an LLM’s explanations can be scored directly against expert judgment.
Submission Number: 104
Loading