Abstract: As fairness-aware information retrieval systems have attracted increasing attention, fairness evaluation metrics have become an emerging need, especially for complex fairness definitions involving more than one fairness category (e.g., economic status and geographic locations). How to aggregate query-level evaluations into system-level evaluations and how to merge multiple fairness categories remain primeval. Existing metrics treat different queries and fairness categories equally and ignore the relationship between the fairness category and query. This can be problematic when aggregating query-level evaluations to a system level where a fairness category (e.g., geographic location) does not contextually apply to some queries (e.g., mathematics) but applies to others (e.g., rivers). Besides, when evaluating fairness regarding multiple fairness categories for the same query, it is also unclear which category is more important than the others. Therefore, we introduced a concept of applicability that quantifies the contextual relationship between fairness categories and queries. We also proposed a context-aware applicability-weighted fairness evaluation (CAFE) that leverages applicability-weights to aggregate queries and fairness categories. Our validation tests confirm the effectiveness and superiority of CAFE in capturing the contextual relationships between queries and fairness categories. Finally, we explored using applicability weights in a recently proposed fair ranking algorithm to showcase the usefulness of applicability further and proposed two ways of estimating applicability to solve data sparsity when deploying CAFE, using the percentage of missing annotations as a proxy and generative large language models as an annotator.
External IDs:dblp:conf/nldb/ChenF25
Loading