Keywords: Author Identification, Multi-model Fusion, Large Language Model, Graph Convolutional Network, Machine Learning, Natural Language Processing
TL;DR: 4th Place Solution WhoIsWho-IND-KDD-2024
Abstract: $The rapid growth of online publications has increasingly complicated the problem of disambiguating authors with the same name. Existing disambiguation systems suffer from low accuracy, leading to errors in author rankings and instances of award fraud. This paper proposes a machine learning-based model to effectively detect incorrectly assigned papers within a given author's collection. The dataset includes personal profiles and detailed attributes of papers, such as title, abstract, authors, keywords, location, and year of publication. We developed four models: LightGBM, ChatGLM3-32k, Llama3, and GCN, and employed model fusion to leverage their differences. Through multiple rounds of experiments and validation on test sets, we achieved a fourth-place result. This model effectively detects misassigned papers for given authors, improving the accuracy of author disambiguation. Unlike existing academic search systems, our approach does not rely on pre-existing name disambiguation results. Consequently, our model more accurately identifies paper authorship, thereby preventing errors in author rankings and award fraud. This model has significant application value and research implications in the field of author name disambiguation.$
Submission Number: 21
Loading