Quantifying and Advancing Information Retrieval System Explainability

Catherine Chen

Published: 01 Jan 2023, Last Modified: 07 Jun 2024SIGIR 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: As information retrieval (IR) systems, such as search engines and conversational agents, become ubiquitous in various domains, the need for transparent and explainable systems grows to ensure accountability, fairness, and unbiased results. Despite many recent advances toward explainable AI and IR techniques, there is no consensus on what it means for a system to be explainable. Although a growing body of literature suggests that explainability is comprised of multiple subfactors [2, 5, 6], virtually all existing approaches treat it as a singular notion. Additionally, while neural retrieval models (NRMs) have become popular for their ability to achieve high performance[3, 4, 7, 8], research on the explainability of NRMs has been largely unexplored until recent years. Numerous questions remain unanswered regarding the most effective means of comprehending how these intricate models arrive at their decisions and the extent to which these methods will function efficiently for both developers and end-users. This research aims to develop effective methods to evaluate and advance explainable retrieval systems toward the broader research field goal of creating techniques to make potential biases more identifiable. Specifically, I aim to investigate the following: RQ1: How do we quantitatively measure explainability?RQ2: How can we develop a set of inherently explainable NRMs using feature attributions that are robust across different retrieval domain contexts?RQ3: How can we leverage knowledge about influential training instances to better understand NRMs and promote more efficient search practices? To address RQ1, we leverage psychometrics and crowdsourcing to introduce a multidimensional model of explainability for Web search systems[1]. Our approach builds upon prior research on multidimensional relevance modeling [9] and supports the multidimensionality of explainability posited by recent literature. In doing so, we provide empirical evidence that these factors group between positive and negative facets that describe the utility and roadblocks to explainability of search systems. Additionally, we introduce a continuous-scale evaluation metric for explainable search systems which enables researchers to directly compare and evaluate the efficacy of their explanations. In future work, I plan to address RQ2 and RQ3 by investigating two avenues of attribution methods, feature-based and instance-based, to develop a suite of explainable NRMs. While much work has been done on investigating the interpretability of deep neural network architectures in the general ML field, particularly in vision and language domains, creating inherently explainable neural architectures remains largely unexplored in IR. Thus, I intend to draw on previous work in the broader fields of NLP and ML to develop methods that offer deeper insights into the inner workings of NRMs and how ranking decisions are made. By developing explainable IR systems, we can facilitate users' comprehension of the intricate, non-linear mechanisms that link their search queries to highly ranked content. If applied correctly, this research has the potential to benefit society in a broad range of applications, such as disinformation detection and clinical decision support. Given their critical importance in modern society, these areas demand robust solutions to combat the escalating dissemination of false information. By enhancing the transparency and accountability of these systems, explainable systems can play a crucial role in curbing this trend.