Keywords: BGE, NV-Embed, Rerank
Abstract: Academic data mining is rich in many entity-centric applications, such as paper retrieval, expert discovery and journal recommendation. However, the lack of data benchmarks related to academic knowledge graph mining has severely limited the development of the field. The dataset is derived from OAG-QA, which retrieves question posts from StackExchange and Zhihu websites, extracts the URLs of papers mentioned in the answers, and matches them with the papers in OAG. Participants are provided with a dataset of questions and need to find the papers that best match those questions. We propose a bge vector model first, which is fine-tuned and then rearranged by LLM to get the final result.
Submission Number: 3
Loading