Local List-Wise Explanations of LambdaMART

Amir Hossein Akhavan Rahnama; Judith Bütepage; Henrik Boström

Local List-Wise Explanations of LambdaMART

Amir Hossein Akhavan Rahnama, Judith Bütepage, Henrik Boström

Published: 01 Jan 2024, Last Modified: 19 May 2025xAI (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: LambdaMART, a potent black-box Learning-to-Rank (LTR) model, has been shown to outperform neural network models across tabular ranking benchmark datasets. However, its lack of transparency challenges its application in many real-world domains. Local list-wise explanation techniques provide scores that explain the importance of the features in a list of documents associated with a query to the prediction of black-box LTR models. This study investigates which list-wise explanation techniques provide the most faithful explanations for LambdaMART models. Several local explanation techniques are evaluated for this, i.e., Greedy Score, RankLIME, EXS, LIRME, LIME, and SHAP. Moreover, a non-LTR explanation technique is applied, called Permutation Importance (PMI) to obtain list-wise explanations of LambdaMART. The techniques are compared based on eight evaluation metrics, i.e., Consistency, Completeness, Validity, Fidelity, ExplainNCDG@10, (In)fidelity, Ground Truth, and Feature Frequency similarity. The evaluation is performed on three benchmark datasets: Yahoo, Microsoft Bing Search (MSLR-WEB10K), and LETOR 4 (MQ2008), along with a synthetic dataset. The experimental results show that no single explanation technique is faithful across all datasets and evaluation metrics. Moreover, the explanation techniques tend to be faithful for different subsets of the evaluation metrics; for example, RankLIME out-performs other explanation techniques with respect to Fidelity and ExplainNCDG, while PMI provides the most faithful explanations with respect to Validity and Completeness. Moreover, we show that explanation sample size and the normalization of feature importance scores in explanations can largely affect the faithfulness of explanation techniques across all datasets.

Loading