Dense Retrieval for Efficient Paper Retrieval in Academic Question Answering

28 Jun 2024 (modified: 05 Aug 2024)KDD 2024 Workshop OAGChallenge Cup SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Information Retrieval, Academic Question Answering, Open Academic Graph Challenge
Abstract: The overarching goal of academic data mining is to deepen our comprehension of the development, nature, and trends of science. It offers the potential to unlock enormous scientific, technological, and educational value. To facilitate related research, Tsinghua University and Zhipu AI have presented the Open Academic Graph Challenge (OAG-Challenge) and published several realistic and challenging datasets. In this paper, we present our solution for the KDD Cup 2024 Academic Question Answering (AQA) task. Participants are required to retrieve the most relevant papers to answer given professional questions from a pool of candidate papers. To address this challenge, we constructed a bi-encoder model for academic paper retrieval. We conducted extensive experiments, exploring various language models (LMs) and ensembling them to boost performance. Additionally, we explored the incorporation of hard negative examples and a reranking model. Our team achieved high-quality results and demonstrated competitive performance in the competition, with mean average precision (MAP) scores of 0.20900 (top-6) and 0.18466 (top-7) on the validation and test sets, respectively. We have released our source code.
Submission Number: 1
Loading