LLM-Powered Ensemble Learning for Paper Source Tracing: A GPU-Free Approach

20 Jul 2024 (modified: 15 Aug 2024)KDD 2024 Workshop OAGChallenge Cup SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Paper source tracing, Large Language Model
TL;DR: We developed a novel approach for identifying the source references of academic papers using a method that combines large language models (LLMs) and traditional machine learning techniques, specifically LightGBM and CatBoost classifiers.
Abstract: We participated in the KDD CUP 2024 paper source tracing com- petition and achieved the 3rd place. This competition tasked par- ticipants with identifying the reference sources (i.e., ref-sources, as referred to by the organizers of the competition) of given aca- demic papers. Unlike most teams that addressed this challenge by fine-tuning pre-trained neural language models such as Bert or ChatGLM, our primary approach utilized closed-source large language models (LLMs). With recent advancements in LLM tech- nology, closed-source LLMs have demonstrated the capability to tackle complex reasoning tasks in zero-shot or few-shot scenarios. Consequently, in the absence of GPUs, we employed closed-source LLMs to directly generate predicted reference sources from the provided papers. We further refined these predictions through en- semble learning. Notably, our method was the only one among the award-winning approaches that did not require the use of GPUs for model training
Submission Number: 26
Loading