μDS: Multi-Objective Data Snippet Extraction for Dataset Search

Published: 01 Jan 2025, Last Modified: 16 Sept 2025SIGIR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With the continuous growth of open data on the Web, dataset search has become a prominent specialized retrieval problem to find datasets relevant to a query. Recent solutions rank datasets based on not only their metadata, but also data snippets extracted from their actual data. While the goodness of a data snippet has been studied from various aspects, in this paper we propose to, for the first time, jointly optimize compactness, relevance, representativeness, and cohesiveness in snippet extraction. To extract such multi-objective data snippets, we formulate a new combinatorial optimization problem and design an efficient algorithm with a proved worst-case approximation ratio. We evaluate the data snippets extracted by our algorithm intrinsically through a set of quality metrics and extrinsically by applying them to dataset search.
Loading