Efficient Training and Stacking Methods with BERTs-LightGBM for Paper Source Tracing

18 Jul 2024 (modified: 15 Aug 2024)KDD 2024 Workshop OAGChallenge Cup SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Pretraining Model, Pooling Methods, Adversarial Training, Weighted Ensemble Models
TL;DR: The core highlight of our approach is the application of AWP and ConcatPool to several BERT-based pretraining models, and integrating these models with LightGBM to enhance overall performance.
Abstract: Despite the rapid development of large language models in recent years, tracing the source of scientific papers remains a challenging task due to the large-scale of citation relations between papers. To address this challenge, the PST-KDD-2024 competition is launched by the Knowledge Engineering Group (KEG) of Tsinghua University, in collaboration with ZhipuAI. In the competition, the Heart algorithm team has proposed an innovative method that includes three key strategies: adversarial weight pertubation (AWP), concatenate mean pooling (ConcatPool), and weighted ensemble. The core highlight of our approach is the application of AWP and ConcatPool to several BERT-based pretraining models, which significantly improves the performance and robustness of these models. In addition, the weighted ensemble strategy integrates the results of multiple models, including the BERT-based models and LightGBM, to leverage their strengths to produce more robust and accurate results. Through the test benchmark, we have achieved a significant improvement in the performance metric, with the test score increasing from 0.32042 to 0.41778. With our innovative solution, our team Heart won the 7th place in the final leaderboard of the PST-KDD-2024 competition.The implementation details and code are publicly available at this link: https://github.com/Hearttttt1/PST-KDD-2024-Heart-Rank7
Submission Number: 17
Loading