Exploiting large language model with reinforcement learning for generative job recommendations

Zhi Zheng, Zhaopeng Qiu, Chen Zhu, Xiao Hu, Likang Wu, Yang Song, Hengshu Zhu, Hui Xiong

Published: 2026, Last Modified: 21 Jan 2026Frontiers Comput. Sci. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the rapid development of Large Language Models (LLMs), an increasing number of researchers are turning their attention to Generative Recommender Systems (GRSs), which are not constrained by strict candidate sets and are more conducive to exploring user interests. Existing LLM-based GRSs mainly utilize Supervised Fine-Tuning (SFT) to endow LLMs with the capability to generate candidate items, and further employ similarity-based grounding methods to map the generated results to real-world items. However, SFT-based training methods are insufficient for LLMs to adequately grasp the knowledge embedded in complex interactive behaviors, and similarity-based grounding methods also face challenges for long text matching. Therefore, in this paper, we propose generative job recommendation based on large language models (GIRL). Specifically, we propose to train a model which can evaluate the matching degree between curriculum vitae (CV) and job description (JD) as a reward model, and we use a proximal policy optimization (PPO)-based reinforcement learning (RL) method to fine-tune the LLM-based recommender. Moreover, we propose a model-based grounding method for JD grounding. Extensive experiments on two real-world datasets demonstrate the superiority of the proposed model compared to seven baseline methods.

External IDs:dblp:journals/fcsc/ZhengQZHWSZX26