DIEq: Dynamic Identity Equilibrium for Author Disambiguation in KDD Cup 2024 WhoIsWho-IND Challenge

20 Jul 2024 (modified: 15 Aug 2024)KDD 2024 Workshop OAGChallenge Cup SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Author Disambiguation, Data Imbalance Mitigation, Feature Engineering, Adversarial Learning, Text Representation
TL;DR: This study introduces DIEq, a novel preprocessing technique for author disambiguation that addresses dataset imbalance, enhances model accuracy, and demonstrates improved performance in the WhoIsWho-IND task of KDD Cup 2024.
Abstract: We propose Dynamic Identity Equilibrium (DIEq), a novel data preprocessing technique for author disambiguation. DIEq addresses dataset imbalance by simultaneously interpreting a subset of negative samples as both negative and positive, creating 'academic identity phantoms' that enrich the feature space. This approach not only exploits data imbalance but also accelerates convergence and enhances model prediction accuracy. Ranking in the top 10 of the WhoIsWho-IND task at KDD Cup 2024, our approach combines over 3,000 hand-crafted features with a 5-fold LGBM model, achieving a 0.5-1.2\% increase in wAUC on test data, contributing to more precise academic impact assessment and knowledge discovery.
Submission Number: 27
Loading