DSEBench: A Test Collection for Explainable Dataset Search with Examples

Published: 2025, Last Modified: 14 Jan 2026CoRR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Dataset search is a well-established task in the Semantic Web and information retrieval research. Current approaches retrieve datasets either based on keyword queries or by identifying datasets similar to a given target dataset. These paradigms fail when the information need involves both keywords and target datasets. To address this gap, we investigate a generalized task, Dataset Search with Examples (DSE), and extend it to Explainable DSE (ExDSE), which further requires identifying relevant fields of the retrieved datasets. We construct DSEBench, the first test collection that provides high-quality dataset-level and field-level annotations to support the evaluation of DSE and ExDSE, respectively. In addition, we employ a large language model to generate extensive annotations for training purposes. We establish comprehensive baselines on DSEBench by adapting and evaluating a variety of lexical, dense, and LLM-based retrieval, reranking, and explanation methods.
Loading