{
  "title": "Summary of LLM-Based Recommender Systems",
  "turns": [
    {
      "turn": 1,
      "user": "> 2412.12464v1 - PDF\n\nPlease read this paper.",
      "assistant": "Below is a comprehensive summary of the paper \"LLM is Knowledge Graph Reasoner: LLM's Intuition-aware Knowledge Graph Reasoning for Cold-start Sequential Recommendation\".\n\n### **Key Ideas**\n\nThis paper proposes a new model, LIKR (LLM's Intuition-aware Knowledge graph Reasoning model), that complements the weaknesses of both knowledge graph (KG)-based models and large language model (LLM)-based models in recommendation systems.\n\nThe core idea of LIKR is to use an LLM as a \"reasoner that outputs intuitive exploration strategies\" over a knowledge graph (KG). This aims to effectively predict user preferences and improve recommendation performance even in data-sparse 'cold start (cold-start)' scenarios.\n\n### **Problem Statement**\n\nThe paper points out the following key issues in existing recommendation approaches:\n\n*   **Limitations of knowledge graph (KG)-based recommendation:**\n    *   It struggles to account for the temporal dynamics of user preferences.\n    *   Performance drops significantly in cold-start settings with scarce user interaction data.\n*   **Limitations of large language model (LLM)-based recommendation:**\n    *   Although they possess rich commonsense and reasoning ability, input token limits make it difficult to process large volumes of recommendation data, leading to scalability issues.\n\n### **How LIKR Works**\n\nLIKR operates in two stages to address these issues.\n\n**Stage 1: Obtaining the LLM's Intuition (Obtaining LLM's Intuition)**\n\n*   Provide the LLM with a prompt containing the user's limited interaction history (e.g., a list of recently watched movies) in chronological order.\n*   The prompt includes time-aware questions such as \"What elements should the next movie this user watches have?\" to induce temporal awareness.\n*   Leveraging its pre-trained knowledge, the LLM infers the user's next preferences and outputs metadata elements it deems important (e.g., specific actors, genres). This result is used as the 'LLM's intuition'.\n\n**Stage 2: Recommendation via KG Reasoning (Recommendation via KG Reasoning)**\n\n*   Build a knowledge graph (KG) from the recommendation dataset (users, items, metadata, etc.).\n*   Train an agent via reinforcement learning to explore optimal recommendation paths on the KG.\n*   The agent's reward function integrates two key signals:\n    1.  **LLM intuition-based reward ( $R_{tLLM}$ ):** The agent receives a reward when it reaches metadata nodes predicted by the LLM in Stage 1.\n    2.  **KG embedding-based reward ( $R_{tKG}$ ):** Rewards are computed based on semantic relations among entities within the KG.\n*   The trained agent starts from the user node, searches for paths that maximize reward within the KG, and recommends the items it ultimately reaches.\n\n### **Key Experimental Results**\n\nThe paper evaluates LIKR on real-world datasets, MovieLens-1M and Lastfm-1M.\n\n*   **Overall performance (RQ1):** LIKR outperforms conventional sequential recommenders (GRU4Rec, SASRec) and KG-based recommenders (PGPR, CAFE, UCPR) in cold-start settings, demonstrating the effectiveness of integrating LLM intuition into reinforcement learning.\n*   **Effect of LLM choice and prompts (RQ2):**\n    *   Testing multiple LLMs (Gemini 1.5-flash, Claude 3.5-sonnet, GPT series), the GPT-4o-preview model achieved the best performance.\n    *   Explicitly encoding temporal order—i.e., adding 'temporal awareness (temporal awareness)'—in the prompt consistently improved recommendation performance.\n*   **Effect of reward design (RQ3):** When varying the weights of the LLM intuition-based and KG embedding-based rewards, using both together was far more effective than using either alone, suggesting the two are complementary.\n\n### **Conclusion and Contributions**\n\nThis paper proposes LIKR, which achieves strong performance in cold-start sequential recommendation by using an LLM as a knowledge-graph reasoner. LIKR shows potential to overcome scalability limits of LLM-based recommendation and to flexibly integrate multiple recommendation strategies via reinforcement learning.\n\nThe main contributions are:\n\n*   Introduces the first approach that uses an LLM as a KG path reasoner for recommendation.\n*   Incorporates reinforcement learning with custom reward functions to combine the strengths of LLMs and KGs.\n*   Demonstrates superior performance over prior state-of-the-art methods in cold-start settings on real-world datasets."
    },
    {
      "turn": 2,
      "user": "Please explain the method in detail",
      "assistant": "Sure, I'll explain the methodology of LIKR (LLM's Intuition-aware Knowledge graph Reasoning model) in detail. LIKR is designed to tackle the 'cold-start sequential recommendation' problem where user interaction history is limited, and the process is divided into two main stages.\n\n### **Step 1: Obtaining LLM's Intuition (Obtaining LLM's Intuition)**\n\nThe goal of this stage is to leverage an LLM to obtain 'intuition' about the user's future preferences.\n\n*   **Input data**: The LLM receives two kinds of information as a prompt.\n    *   The user's limited interaction history (\n         $mathcalH_u$ ): a time-ordered list of items the user has interacted with.\n    *   A set of metadata types (\n         $mathcalY^{d}$ ): the list of metadata types available in the recommendation domain (e.g., genre, director, actor for movies).\n*   **Temporally-aware Prompt (Temporally-aware Prompt)**: We use a prompt that explicitly encodes temporal order so the LLM can understand changes in user preferences. For example, for a movie recommendation dataset, we input a prompt like the following.\n    > \"We want to recommend the next movie to a user. This user watched the following movies in order, with the most recent listed last:  $i_1^{u},i_2^{u},...,i_N_i^{u}$ . Based on metadata types such as y\\<sub\\>movie\\</sub\\>, y\\<sub\\>movie\\</sub\\>, ..., select the particularly important metadata and provide the elements the next movie for this user should have. Please format the output as: y\\<sub\\>movie\\</sub\\>: element\"\n*   **Output**: From this prompt, the LLM outputs a set of metadata elements ( $mathcalE_u$ ) it predicts the user will prefer next. This result is regarded as the LLM's 'intuition'—a contextual interpretation of the user's behavior grounded in its pre-trained knowledge and trends.\n\n### **Step 2: Recommendation via KG Reasoning (Recommendation via KG Reasoning)**\n\nIn this stage, we use the LLM's intuition obtained in Step 1 to perform knowledge graph (KG) reasoning and decide the final recommended items.\n\n1.  **Building and embedding the knowledge graph (KG)**:\n    *   First, we build a knowledge graph \\$\\\\mathcal{G}\\$ composed of users, items, and each item's metadata.\n    *   To leverage the semantic information of entities and relations in the KG, we embed the KG using the TransE method. TransE models a relation as a vector translation from the head entity to the tail entity (\n         $e_1+rapproxe_2$ ), making it computationally efficient and suitable for large-scale graph processing.\n2.  **MDP modeling for reinforcement learning**: We model the recommendation problem as a Markov Decision Process (MDP) to collaboratively leverage multiple signals.\n    *   **State (State,  $s_t$ )**: At each exploration step  $t$ , the state is defined as (user  $u$ , current entity  $e_t$ , visitation history  $h_t$ ).\n    *   **Action (Action,  $a_t$ )**: Select one of the valid outgoing edges from the current entity ( $e_t$ ) to move to the next state.\n    *   **Reward (Reward,  $R_t$ )**: The reward function is designed by integrating two strategies: LLM intuition and KG-embedding-based signals.\n        *   **LLM intuition-based reward ( $R_t^{LLM}$ )**: The agent receives a reward of 1 when it reaches a node included in the metadata set ( $mathcalE_u$ ) predicted by the LLM in Step 1.\n        *   **KG embedding-based reward ( $R_t^{KG}$ )**: Computed based on the inner product between entity embedding vectors along the path.\n        *   **Overall reward ( $R_t$ )**: The final reward is a weighted sum,  $R_t=R_t^{LLM}+alphaR_t^{KG}$ . Here,\n             $alpha$ is a hyperparameter that controls the contribution of the two strategies.\n3.  **Generating the final recommendations**:\n    *   In the MDP environment designed above, we optimize the agent's policy network using the REINFORCE algorithm.\n    *   The trained agent starts from the user node and explores the KG according to the learned policy. Beam search is used for efficient path reasoning.\n    *   Based on all the paths generated in this way, we construct the final Top-N recommendation list."
    }
  ]
}