Confident Action Decision via Hierarchical Policy Learning for Conversational Recommendation

Heeseon Kim, Hyeongjun Yang, Kyong-Ho Lee

Published: 2023, Last Modified: 27 Jun 2023WWW 2023Readers: Everyone

Abstract: Conversational recommender systems (CRS) aim to acquire a user’s dynamic interests for a successful recommendation. By asking about his/her preferences, CRS explore current needs of a user and recommend items of interest. However, previous works may not determine a proper action in a timely manner which leads to the insufficient information gathering and the waste of conversation turns. Since they learn a single decision policy, it is difficult for them to address the general decision problems in CRS. Besides, existing methods do not distinguish whether the past behaviors inferred from the historical interactions are closely related to the user’s current preference. To address these issues, we propose a novel Hierarchical policy learning based Conversational Recommendation framework (HiCR). HiCR formulates the multi-round decision making process as a hierarchical policy learning scheme, which consists of both a high-level policy and a low-level policy. In detail, the high-level policy aims to determine what type of action to take, such as a recommendation or a query, by observing the comprehensive conversation information. According to the decided action type, the low-level policy selects a specific action, such as which attribute to ask or which item to recommend. The hierarchical conversation policy enables CRS to decide an optimal action, resulting in reducing the unnecessary consumption of conversation turns and the continuous failure of recommendations. Furthermore, in order to filter out the unnecessary historical information when enriching the current user preference, we extract and utilize the informative past behaviors that are attentive to the current needs. Empirical experiments on four real-world datasets show the superiority of our approach against the current state-of-the-art methods.

0 Replies