Keywords: XAI, post-host explanation, cost reduction
Abstract: With Large language models (LLMs) becoming increasingly prevalent in various applications, the need for interpreting their predictions has become a critical challenge.
As LLMs vary in architecture and some are closed-sourced, model-agnostic techniques show great promise without requiring access to the model's internal parameters.
However, existing model-agnostic techniques require obtaining an LLM's outputs on a large number of perturbed samples, which leads to high economic costs.
To address this limitation, we propose to leverage explanations from budget-friendly models as proxies to explain expensive LLMs, and a corresponding simple yet effective screen-and-apply framework to ensure the faithfulness of applying proxy explanations.
We empirically evaluate our approach through a series of empirical studies, demonstrating that proxy explanations can achieve over 90\% fidelity compared to oracle explanations, while requiring only 11\% of the cost of oracle explanations.
Moreover, we show that such proxy explanations also perform well on downstream tasks such as optimizing LLM's performance in in-context learning.
Additionally, we open-source our code and datasets to facilitate future research in this area.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 2094
Loading