Keywords: Paper source tracing, Large Language Model
TL;DR: We developed a novel approach for identifying the source references of academic papers using a method that combines large language models (LLMs) and traditional machine learning techniques, specifically LightGBM and CatBoost classifiers.
Abstract: We participated in the KDD CUP 2024 paper source tracing com-
petition and achieved the 3rd place. This competition tasked par-
ticipants with identifying the reference sources (i.e., ref-sources,
as referred to by the organizers of the competition) of given aca-
demic papers. Unlike most teams that addressed this challenge
by fine-tuning pre-trained neural language models such as Bert
or ChatGLM, our primary approach utilized closed-source large
language models (LLMs). With recent advancements in LLM tech-
nology, closed-source LLMs have demonstrated the capability to
tackle complex reasoning tasks in zero-shot or few-shot scenarios.
Consequently, in the absence of GPUs, we employed closed-source
LLMs to directly generate predicted reference sources from the
provided papers. We further refined these predictions through en-
semble learning. Notably, our method was the only one among the
award-winning approaches that did not require the use of GPUs
for model training
Submission Number: 26
Loading