Abstract: In this paper, we propose a novel personalized response generation model via domain adaptation (PRG-DM). First, we learn the human responding style from large general data (without user-specific information). Second, we fine tune the model on a small size of personalized data to generate personalized responses with a dual learning mechanism. Moreover, we propose three new rewards to characterize good conversations that are personalized, informative and grammatical. We employ the policy gradient method to generate highly rewarded responses. Experimental results show that our model can generate better personalized responses for different users.
0 Replies
Loading