See the Big in the Small: Budget-Friendly Explanations for Large Language Models

04 Sept 2025 (modified: 06 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: XAI, post-host explanation, cost reduction
Abstract: With Large language models (LLMs) becoming increasingly prevalent in various applications, the need for interpreting their predictions has become a critical challenge. As LLMs vary in architecture and some are closed-sourced, model-agnostic techniques show great promise without requiring access to the model's internal parameters. However, existing model-agnostic techniques require obtaining an LLM's outputs on a large number of perturbed samples, which leads to high economic costs. To address this limitation, we propose to leverage explanations from budget-friendly models as proxies to explain expensive LLMs, and a corresponding simple yet effective screen-and-apply framework to ensure the faithfulness of applying proxy explanations. We empirically evaluate our approach through a series of empirical studies, demonstrating that proxy explanations can achieve over 90\% fidelity compared to oracle explanations, while requiring only 11\% of the cost of oracle explanations. Moreover, we show that such proxy explanations also perform well on downstream tasks such as optimizing LLM's performance in in-context learning. Additionally, we open-source our code and datasets to facilitate future research in this area.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 2094
Loading