sciDataQA: Scientific Dataset Recommendation for Question AnsweringDownload PDF

Anonymous

16 Dec 2022 (modified: 05 May 2023)ACL ARR 2022 December Blind SubmissionReaders: Everyone
Abstract: In order to advance scientific discovery, it is essential to answer scientific questions regarding a particular field of study. However, these questions might not be answered easily with just a few words and might mislead scientists, delaying scientific discovery. In this paper, we propose to recommend scientific datasets instead of directly answering each question. We introduce sciDataQA, a novel scientific dataset recommendation dataset with 43466 scientific datasets and 244128 questions, including each dataset's title, citation information, summary, and abstract. We construct the dataset with large pre-trained language models and utilize a contrastive-learning-based approach to filter the low-quality questions. Based on this dataset, we develop a novel recursive retrieval approach for scientific dataset recommendation. Further, we illustrate how our dataset can be used to study citation prediction and improve existing scientific QA systems. Extensive experiments show the effectiveness of our recursive retrieval approach and the improvement in the low-resource setting of two existing scientific QA systems with our dataset.
Paper Type: long
Research Area: Question Answering
0 Replies

Loading