Abstract: Relevance and utility are two frequently used measures to evaluate the effectiveness of an information retrieval (IR) system.
Relevance emphasizes the aboutness of a result to a query, while utility refers to the result's usefulness or value to an information seeker.
In Retrieval-Augmented Generation (RAG), high-utility results should be prioritized to feed to LLMs due to their limited input bandwidth. Re-examining RAG's three core components—relevance ranking derived from retrieval models, utility judgments, and answer generation—aligns with Schutz’s philosophical system of relevances, which encompasses three types of relevance representing different levels of human cognition that enhance each other. These three RAG components also reflect three cognitive levels for LLMs in question-answering. Therefore, we propose an Iterative utiliTy judgmEnt fraMework (ITEM) to promote each step in RAG. We conducted extensive experiments on retrieval (TREC DL, WebAP), utility judgment task (GTI-NQ), and factoid question-answering (NQ) datasets. Experimental results demonstrate significant improvements of \modelname in utility judgments, ranking, and answer generation upon representative baselines.
Paper Type: Long
Research Area: Information Retrieval and Text Mining
Research Area Keywords: re-ranking, retrieval-augmented generation, passage retrieval
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 226
Loading