“Find Me a Dataset”: Scientific Dataset Recommendation from Method DescriptionsDownload PDF


17 Dec 2021 (modified: 05 May 2023)ACL ARR 2021 December Blind SubmissionReaders: Everyone
Abstract: Much of modern science relies on public datasets to develop research ideas. Finding a dataset for a given task can be difficult, particularly for new researchers. We aim to improve the process of dataset discovery by introducing a system called DatasetFinder which recommends relevant datasets given a short natural language description of a research idea. For the new task of dataset recommendation, we construct an English-language dataset that leverages existing annotations and compare several ranking models on this dataset. We also compare our proposed models against existing commercial search engines and find evidence that leveraging natural language descriptions improves search relevance. To encourage development on this new task, we release our constructed dataset and models to the public.
Paper Type: long
Consent To Share Data: yes
0 Replies
