Author Name Disabiguation using Markov Chain Monte Carlo

20 Jul 2024 (modified: 21 Jul 2024)KDD 2024 Workshop OAGChallenge Cup SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Incorrect Name Detection using Author Name Disambiguation(IND-AND), Author Name Disambiguation, Graph Clustering, AND-MCMC Algorithm, Anomaly Detection, WhoIsWho Dataset, Weighted AUC, Graphlet Analysis, Scholarly Databases, Iterative Graph Refinement, OAG-Challenge, KDD Cup 2024
TL;DR: The paper proposes an algorithm called AND-MCMC for author name disambiguation, using graph structures and iterative refinement to identify incorrectly assigned papers in academic databases.
Abstract: This paper presents a novel approach to the Incorrect Name Detection (IND) task as part of the KDD Cup 2024 Open Academic Graph Challenge (OAG-Challenge). We propose Author Name Disambiguation using Markov Chain Monte Carlo (AND-MCMC) algorithm to identify incorrectly assigned papers within author profiles in the WhoIsWho dataset. Our method constructs graph structures or "graphlets" for each author and employs an iterative refinement process that prioritizes split actions over merge actions. The approach aims to effectively separate anomalous papers from those correctly attributed to the predominant author. Leveraging the dataset's structure that includes correctly and incorrectly assigned publications, the algorithm employed in this work processes one author's file at a time. We evaluate our method using a weighted Area Under the Receiver Operating Characteristic Curve (AUC) metric, which accounts for varying error distributions across authors. This work contributes to academic graph mining by addressing the challenges associated with detecting incorrect paper attributions in large-scale scholarly databases.
Submission Number: 28
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview