Enhancing Author Name Disambiguation: Combine Large Language Models and Machine Learning

GuiBin Lai; Lingze Xie

Enhancing Author Name Disambiguation: Combine Large Language Models and Machine Learning

GuiBin Lai, Lingze Xie

19 Jul 2024 (modified: 15 Aug 2024)KDD 2024 Workshop OAGChallenge Cup SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Author Identification, Multi-model Fusion, Large Language Model, Graph Convolutional Network, Machine Learning, Natural Language Processing

TL;DR: 4th Place Solution WhoIsWho-IND-KDD-2024

Abstract: $The rapid growth of online publications has increasingly complicated the problem of disambiguating authors with the same name. Existing disambiguation systems suffer from low accuracy, leading to errors in author rankings and instances of award fraud. This paper proposes a machine learning-based model to effectively detect incorrectly assigned papers within a given author's collection. The dataset includes personal profiles and detailed attributes of papers, such as title, abstract, authors, keywords, location, and year of publication. We developed four models: LightGBM, ChatGLM3-32k, Llama3, and GCN, and employed model fusion to leverage their differences. Through multiple rounds of experiments and validation on test sets, we achieved a fourth-place result. This model effectively detects misassigned papers for given authors, improving the accuracy of author disambiguation. Unlike existing academic search systems, our approach does not rely on pre-existing name disambiguation results. Consequently, our model more accurately identifies paper authorship, thereby preventing errors in author rankings and award fraud. This model has significant application value and research implications in the field of author name disambiguation.$

Submission Number: 21

Loading