A Hybrid Approach for Paper Source Tracing using Cross Encoder and Hand-Crafted Features

09 Jul 2024 (modified: 07 Aug 2024)KDD 2024 Workshop OAGChallenge Cup SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: KDD Cup, Cross-Encoder, Ensemble
Abstract: Academic graph mining, specifically paper citation analysis, is crucial for identifying promising technologies and efficient citation-based paper retrieval. The influence of cited papers varies, necessitating the quantification of their impact using large citation datasets and ground truth data. Although traditional methods used hand-crafted features for a limited dataset, advances in large-scale language models (LLMs) suggest potential improvements. To this end, the organizers of KDD Cup 2024 launched a competition focused on academic graph mining, called OAG-Challenge, accompanied by a large scale dataset referred to as OAG-Bench dataset. In this paper, we, DOCOMOLABZ, present our solution that achieved a 8th place ranking on the public leaderboard for the Paper Source Tracing (PST) task within the OAG-Challenge. Our solution is based on two hypotheses: (1) Highly influential cited papers show high similarity between their titles and the context in which they are cited, and (2) Hand-crafted features, such as citation frequency, are effective indicators of influence. The source code of our solution is available at https://github.com/NTT-DOCOMO-RD/kddcup2024-oag-challenge-pst-9th-solution-nttdocomolabz
Submission Number: 4
Loading