Knowledge Distillation for Job Title Prediction and Project Recommendation in Open Source Communities

Published: 01 Jan 2025, Last Modified: 23 Oct 2025ECML/PKDD (10) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In the era of rapid digitalization, the demand for digital talents is surging and talent management in open source communities has become a crucial research area. This paper explores the application of large language models (LLMs) in two key talent management tasks within open source communities: project recommendation and job title prediction. First, we construct an evaluation dataset TM-Eval to assess the performance of LLMs on the two tasks. Second, we construct a QA dataset JA-QA from LinkedIn that describes the required APIs for each job title with job description. The dataset is used to distill knowledge pertaining to job-API correspondence of larger LLMs into smaller ones, in order to reduce computational overhead for the two tasks. We propose a hierarchical knowledge transfer method including logit-based distillation, feature-based distillation and task-specific fine-tuning with Low-Rank Adaptation. Experimental results show that larger LLMs outperform smaller ones on the two tasks. Moreover, the proposed distillation method can effectively enhance the performance of smaller LLMs, making them even surpass the original larger LLMs in some cases. This study provides a new approach for talent management in open source communities, which leverages the knowledge of LLMs to improve prediction and recommendation accuracy while reducing computational overhead. A replication package is available at https://github.com/DaSESmartEdu/KDJPPR.
Loading