Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

Jianqun Zhou; Yuanlei Zheng; Wei Chen; Qianqian Zheng; Shang Zeyuan; Wei Zhang; Rui Meng; Xiaoyu Shen

Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

Jianqun Zhou, Yuanlei Zheng, Wei Chen, Qianqian Zheng, Shang Zeyuan, Wei Zhang, Rui Meng, Xiaoyu Shen

Published: 22 Jan 2025, Last Modified: 02 Mar 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, Instruction-Following, Retrieval Model, Benchmark

TL;DR: We propose InFoSearch, a novel benchmark, along with an evaluation protocol to assess the depth of instruction-following capabilities in information retrieval (IR) models.

Abstract: Instruction-following capabilities in large language models (LLMs) have progressed significantly, enabling more complex user interactions through detailed prompts. However, retrieval systems have not matched these advances, most of them still relies on traditional lexical and semantic matching techniques that fail to fully capture user intent. Recent efforts have introduced instruction-aware retrieval models, but these primarily focus on intrinsic content relevance, which neglects the importance of customized preferences for broader document-level attributes. This study evaluates the instruction-following capabilities of various retrieval models beyond content relevance, including LLM-based dense retrieval and reranking models. We develop InfoSearch, a novel retrieval evaluation benchmark spanning six document-level attributes: Audience, Keyword, Format, Language, Length, and Source, and introduce novel metrics -- Strict Instruction Compliance Ratio (SICR) and Weighted Instruction Sensitivity Evaluation (WISE) to accurately assess the models' responsiveness to instructions. Our findings indicate that although fine-tuning models on instruction-aware retrieval datasets and increasing model size enhance performance, most models still fall short of instruction compliance. We release our dataset and code on https://github.com/EIT-NLP/InfoSearch.

Primary Area: datasets and benchmarks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 11080

Loading