LLM-Based Iterative Hard Example Mining with Boosting for Academic Question Answering

Yang Zhou; Haoru Chen; Xiaocheng Zhang; Mengjiao Bao; Peng Yan

LLM-Based Iterative Hard Example Mining with Boosting for Academic Question Answering

Yang Zhou, Haoru Chen, Xiaocheng Zhang, Mengjiao Bao, Peng Yan

15 Jul 2024 (modified: 21 Jul 2024)KDD 2024 Workshop OAGChallenge Cup SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Natural Language Processing, Question Answering Retrieval, Large Language Model, KDDCup 2024

TL;DR: Top1 winning solutions for the KDD Cup 2024 OAG-Challenge by Black-Pearl Lab, focusing on 'LLM for Vector' and 'Iterative Hard Example Mining with Boosting' for superior academic resource retrieval.

Abstract: This paper describes the winning solutions of the KDD Cup 2024 Open Academic Graph Challenge (OAG-Challenge) from the Black- Pearl Lab team. The challenge was to explore retrieval methods for academic resources, allowing us to answer specialized questions by retrieving relevant papers. This can provide researchers and the general public with high-quality, cutting-edge academic knowledge across various fields. Our solution includes both recall and ranking processes, with a primary focus on two core ideas: "LLM for Vector" and "Iterative Hard Example Mining with Boosting". In the initial stages of text representation, similarity measures typically relied on autoencoder models, which proved suboptimal for this task. In contrast, vector representations derived from large language models (LLMs) have demonstrated superior performance in recent years, excelling in this specific task as well. Furthermore, we identified the critical role of negative sample mining, particularly in contexts where "similarity does not neces- sarily imply correctness". The process of mining hard examples is essential for effective model learning, prompting us to introduce the "Iterative Hard Example Mining with Boosting" strategy. This approach incrementally recalls more challenging negative samples, ultimately integrating them to enhance overall performance. Our method ranks 1rd place in the final leaderboard,code is pub- licly available at this link:https://github.com/BlackPearl-Lab/KddCup- 2024-OAG-Challenge-1st-Solution.

Submission Number: 8

Loading