Keywords: KDD Cup, Cross-Encoder, Ensemble
Abstract: Academic graph mining, specifically paper citation analysis, is crucial for identifying promising technologies and efficient citation-based paper retrieval. The influence of cited papers varies, necessitating the quantification of their impact using large citation datasets and ground truth data. Although traditional methods used hand-crafted features for a limited dataset, advances in large-scale language models (LLMs) suggest potential improvements. To this end, the organizers of KDD Cup 2024 launched a competition focused on academic graph mining, called OAG-Challenge, accompanied by a large scale dataset referred to as OAG-Bench dataset. In this paper, we, DOCOMOLABZ, present our solution that achieved a 8th place ranking on the public leaderboard for the Paper Source Tracing (PST) task within the OAG-Challenge. Our solution is based on two hypotheses: (1) Highly influential cited papers show high similarity between their titles and the context in which they are cited, and (2) Hand-crafted features, such as citation frequency, are effective indicators of influence.
The source code of our solution is available at https://github.com/NTT-DOCOMO-RD/kddcup2024-oag-challenge-pst-9th-solution-nttdocomolabz
Submission Number: 4
Loading