%\documentclass{uai2025} % for initial submission
\documentclass[accepted]{uai2025} % after acceptance, for a revised version; 
% also before submission to see how the non-anonymous paper would look like 
                        
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2025} % ptmx math instead of Computer
                                         % Modern (has noticeable issues)
% \documentclass[mathfont=newtx]{uai2025} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
\usepackage{multirow}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{makecell}
\usepackage[backend=biber]{biblatex}




\addbibresource{custom.bib}

% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

\title{Learning Multi-interest Embedding with Dynamic Graph Cluster for Sequential Recommendation}

% The standard author block has changed for UAI 2025 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1]{{Chunjing Xiao}}
\author[1]{\href{mailto:<2023052042@cauc.edu.cn>?Subject=Learning Multi-interest Embedding with Dynamic Graph Cluster for Sequential Recommendation}{Ranhao Guo}}
\author[2]{Yongwang Zhang}
\author[3]{Xiaoming Wu}

% Add affiliations after the authors
\affil[1]{%
    School of Computer Science and Technology\\
    Civil Aviation University of China\\
    Tianjin, China
}

\affil[2]{%
    Research and Development Center\\
    TravelSky Technology Limited\\
    Beijing, China
  }
\affil[3]{%
    Research and Development Center\\
    TravelSky Technology Limited\\
    Beijing, China
}
  
\begin{document}
\maketitle
%\vspace{-10cm}
\begin{abstract}
  Multi-interest recommendation is to predict the next item by representing diversity of a user preference with multiple interest embeddings. Although existing methods have achieved convincing
results in recommendation tasks, they ignore the continuously changing relations of no-adjacent items in a sequence.
In this paper, we focus on how to fully capture the changing relations when capturing the user multi-interest representations. Specifically, we propose a novel dynamic graph cluster-based multi-interest model named MDGR, which not only comprehensively explores the real changing item relations between no-adjacent items by iteratively constructing
and continuously optimizing interest sub-graph to update the multiple interest embeddings but also collaborates temporal information and interest weight to model the interactive behaviors of users and items. %Recent research primarily progresses in two directions: clustering and graph neural networks. However, we find that the former does not fully consider the transmission of information between items in the sequence, and the latter lacks flexible and diverse graph construction methods, thereby affecting the performance of recommendation candidate retrieval. 
Our model iteratively constructs and continuously optimizes the interest sub-graph by comprehensively adopting dynamic graph cluster to explore the item relations in sequences. That is beneficial to dynamically model user multiple interests  and  accelerate the model's convergence speed. Furthermore, we employ the attention module to extract different influence of various interest embeddings.
Finally, we use the refined item embedding and the final multi-interest embeddings to  retrieval the next item that a user is most likely to interact with. To the best of our knowledge, this is the first attempt to explore multi-interest embeddings by iteratively constructing and continuously optimizing the interest sub-graph. Extensive experiments on three popular benchmark datasets demonstrate that MDGR outperforms several state-of-the-art methods and accelerates the convergence speed.
\end{abstract}
%\vspace{-0.96cm}
\section{Introduction}\label{sec:intro}
With the development of the Internet, recommender systems have become crucial tools in addressing information overload and enhancing competitiveness in various online services such as news recommendation, e-commerce, advertising, and social media. It is evident that sequential recommender systems have gained increasing attention \cite{seq1}, \cite{seq2}, which predict the next item a user might be interested in by analyzing their historical behaviors. The core challenge is to accurately capture user interests from complex user behavioral sequences.


% Users’ browsing records 
In recent years,  researchers have proposed many sequential recommendation models (GRU4Rec \cite{Gru4Rec}, CL4SRec \cite{seq1}  DCRec \cite{seq2} and MAERec\cite{MAERec} ) for modelling user interests to improve performance. Although achieving great success, all of them represent a user interest with a single embedding. However, users may engage with different types of items in interaction history. A single embedding is insufficient for accurately capturing the diversity of user interests.

Currently, multi-interest solutions are designed to solve this problem, which are classified into two categories: split by cluster and split by Graph Neural Networks (GNNs). 
The former divides the items interacted by a user into different clusters according to the item embeddings 
 \cite{mip}, \cite{RimiRec},  \cite{comirec} or labels \cite{pin}, and then obtains a single interest embedding for each cluster. MIP\cite{mip}, REMI \cite{RimiRec} and ComiRec \cite{comirec} cluster the item enbeddings of a user behavior history and apply attention mechanism or CapsNet to generante multiple interest representations, while PinText \cite{pin} first clusters the items by category labels and computes a representation embedding per cluster. The latter constructs user-interaction item graphs and uses GNN to aggregate neighbor information to generate multiple interest representations \cite{surge}, \cite{graph2}, \cite{graph1}. SURGE \cite{surge} and MI-GNN \cite{taiwan} build interest sub-graph according to user historical behavior sequences and learn a interest embedding for each sub-graph through GNNs, while BIGCF\cite{big} and MGNM \cite{MGNM} construct item graphs and apply GNNs to capture high-order relationships to obtain user multiple interest embeddings. Although these methods effectively model a user's multi-interests, the limitation of the two
approaches is obvious. Split by cluster methods heavily depend on the initial distribution of item features and ignore the real relations between items in interaction sequences. %are highly capable of modeling the rich relations between nodes and providing a rich representation of each node. Therefore,in some studies,researchers use GNNs to obtain representations of user interests.
Split by GNN models %only assume that adjacent items in the sequence are similar and add an edge between them on interest graphs. Therefore,  these graphs 
only capture adjacent relations between items in a sequence, but ignore the similarities among non-adjacent items. %according to the position of the nodes on the graph, and each cluster represents an interest of the user,which achieving favorable results.
In fact, the relation between items is always changing as the interactions continue to happen. Therefore, it is important to find the real relation between non-adjacent items when modelling multiple interest models. %it should be noted that non-adjacent items may be correlations in interaction sequence. Therefore, it is necessary to address this issue to capture the accurate relation between items. 
For example, Figure \ref{fig:intro} shows a user's interaction behavior sequence. An edge will be added between mobile phone and basketball shoes if constructing graph according to adjacent relationship in sequence, which may introduce noise to decrease recommendation performance when updating and propagating information between non-correlation nodes. However, no edge is constructed between iPad and mobile phone with stronger correlation, which belong to electronic products. Therefore, we should explore the correlation between these non-adjacent items like iPad and mobile phone, basketball shoes and jerseys, which is critical to improve the recommendation performance. The challenging problem is to accurately capture the correlations between items to iteratively learn user interests with the interactions happening. 

%However, no edge is constructed between iPad and mobile phone with stronger correlation, which belong to electronic products. Therefore, we should explore the correlation between these non-adjacent items like iPad and mobile phone, basketball shoes and jerseys, which is critical to improve the recommendation performance. The challenging problem is to accurately capture the correlations between items to iteratively learn user interests with the interactions happening. 

%It is insufficient to capture the actual relationships during the training.  
%within the sequence on the features. Consequently, the interest representations derived from such methods may not accurately capture the user's true preferences.
%英文中尽量少的用BASED ON 这种说法 
%To address the issue of ignoring the relationships between items when using clustering methods, researchers have begun using Graph Neural Networks (GNNs) to obtain representations of users that capture the multiple interests of individuals. This is accomplished through the updating of node features via message passing between nodes.
\begin{figure}
    \centeringintro
    \includegraphics[width=0.9\linewidth]{intro.png}
    \caption{Example of multi-interest}
    \label{fig:intro}
    %\vspace{-0.6cm}
\end{figure}
%For example,   such as mistakenly predicting a computer. In reality, by establishing connections between headphones, smartphones, and tablets, and between basketball shoes and jerseys, we can accurately identify the user's genuine interests in electronics and basketball. Additionally, by incorporating temporal information, we can more accurately predict that the user's next likely interaction will be with a basketball. Therefore, the graph should be constructed based on categories, and taking into account the impact of each node update on category division.   
% In the field of multi-interest modeling, both graph-based and clustering-based methods have unique advantages and disadvantages, but they also exhibit some major shortcomings:(1)Dependence on initial feature distribution(2)Ignoring relationships between items(3)Static nature (clustering methods)Dependence on sequence order (GNN methods)

To address this issue, we propose a novel method to learn \underline{M}ulti-interest Embedding with \underline{D}ynamic \underline{G}raph Cluster for Sequential
\underline{R}ecommendation (MDGR), which aims to not only capture actual changing correlations between items by iteratively
constructing and continuously optimizing interest sub-graph to
update the multiple interest embeddings but also collaborates
temporal information and interest weight to model the interactive
behaviors of users and items. Specially, we construct the multiple 
interest sub-graph by comprehensively clustering the item embeddings obtained from the processed item IDs, positions and timestamps, which reduces the impact of noisy edges between unrelated items and significantly decreases the complexity of the graph construction. Then we update item representations by multi-head clustering attention mechanism to extract the real correlations between items in sub-graph, enabling the acquisition of more comprehensive item representations by incorporating related information. Thirdly, we iteratively reconstruct interest sub-graph according to the updated item embeddings and continuously learn user multiple interest representations from sub-graph which makes the convergence speed faster. Finally, we introduce weights for each interest embedding and combine them with temporal information  to predict the next item a user may be interested in, which better takes into account the impact of time intervals on the next item recommendation.
%co-occurrence relation and item transition relation betweenitems, and obtain a generated item relation graph by utilizingthe cosine similarity of items and item transition relation. These graphs are produced using our unique dynamic clustering method, which ensures that nodes are of the same type. This design guarantees that information is propagated solely among similar nodes.It is reasonable to repeatedly execute the dynamic graph clustering process, as better division results can be achieved after each node update.In addition, we adopt an interest weight module to assign a corresponding interest weight to each interest representation,which consider the user's preference for different interests 
In summary, the main contributions of this paper are as follows: 

(1) To the best of our knowledge, this is the first attempt to iteratively reconstruct and continuously optimize the interest sub-graph by considering comprehensively the item changing real relation in sequences to dynamically model the user multiple interests.

(2) We propose MDGR, which not only comprehensively explores the real changing item relation between no-adjacent items by iteratively constructing
and continuously optimizing interest sub-graph to update the multiple interest embeddings but also collaborates temporal information and interest weight to model the interactive behaviors of users and items.

(3) We conduct empirical studies on three public datasets. The experimental results show the significant performance improvements compared with the state-of-the-art methods and our method achieves faster convergence speed.
%\vspace{-0.3cm}
\section{RELATED WORK}
%\vspace{-0.2cm}
%sequential modelling in recommendation systems and multi-interest learning are the two principal areas of relevance to our work, we provide a concise overview of the existing methods in these two fields. 
%\vspace{-0.1cm}
\subsection{Sequential Recommendation}
%\vspace{-0.3cm}
%Sequential recommendation is a method to predict user preferences by analyzing user behaviour sequences. 
%Markov chains are a typical inference method \cite{MM}, \cite{MF}, which performs well for short-term behaviour patterns. Neural network-based sequential recommendation methods \cite{neural1}, \cite{neural2}, \cite{neural3} have become increasingly prevalent. For example, RNN-based methods \cite{rnn1}, \cite{rnn2}, \cite{rnn3}, attention mechanism-based methods \cite{sas}, \cite{self1}, \cite{self2}, transformer-based methods 
 %\cite{transfomer1}, \cite{transfomer2}, \cite{transfomer3}. Graph neural networks all can effectively capture the complex relations and structural information \cite{GNN1}, \cite{GNN2}, \cite{GNN3}.

Sequential recommendation is to predict the next item by exploiting a user behavior sequences. %Markov chains are a typical inference method that assumes the current state is independent of future and past states. For instance, Steffen Rendle \cite{MM}\cite{MF}and colleagues combined matrix factorisation (MF) and Markov chains (MC)to predict future behaviour. 
Traditional sequential recommendation models \cite{MM}\cite{MF} adopt Markov chains to model the first- and high-order dependencies in user historical sequences. Although these methods perform better for short-term behaviour patterns, it is unable to capture global dependencies .
%Consequently, neural network-based sequential recommendation methods have become increasingly prevalent\cite{neural1} \cite{neural2} \cite{neural3}. For example, AGRE\cite{rnn1} and \cite{rnn2}\cite{rnn3} employ  RNN-based methods to model entire sessions, thereby offering more precise suggestions.
 With the great success of deep learning in recommendation, deep learning-based models (\emph{i.e.}, RNN \cite{rnn1}\cite{rnn2}\cite{rnn3} , CNN \cite{cnn1}\cite{cnn2}\cite{cnn3}) have been proposed to model long-term dependencies in users' whole historical sequences.
However, they fail to explicitly distinguish the different item impact on user preferences. The introduction of attention mechanism (\emph{i.e.}, SASRec \cite{sas}, FSASA \cite{self1}, BSA-ST-Rec \cite{self2}, ARD \cite{neural1}, CARCA \cite{neural2}) and transformer (\emph{i.e.},UGT \cite{transfomer1}, STRec \cite{transfomer2}, TRON \cite{transfomer3}) has brought new insights to address this issue, while most of them ignore the item transition relationships in session sequences. 
%In order to differentiate the varying impacts of interactions at different times on the next prediction, the attention mechanism was introduced. The self-attention mechanism was applied to sequential recommendation by SASRec\cite{sas} and FSASA\cite{self1},BSA-ST-Rec\cite{self2}.The introduction of Transformer also provided new insights into sequential recommendation,enhancing recommendation accuracy and effectiveness e.g.\cite{transfomer1},\cite{transfomer2},\cite{transfomer3}.Graph neural networks all can effectively capture the complex relations and structural information \cite{GNN1}, \cite{GNN2}, \cite{GNN3}.
Currently, Graph Neural Networks (GNNs) have been widely used in sequential recommendation to effectively capture the complex relations and structural information \cite{GNN1}, \cite{GNN2}, \cite{GNN3}. However, most of these methods ignore the relation between non-adjacent items within user interaction sequences, which can assist in improving recommendation performance.


\begin{figure*}[!ht]
    \centering
    % 裁剪图像的左、下、右、上边界，这里假设各裁剪10mm
    \includegraphics[width=0.85\linewidth,height=0.55\linewidth,trim=7cm 2.65cm 7.9cm 2.6cm, clip]{uai2025-template/frameworknew.pdf}
    \caption{Overview of MDGR framework, which includes item embedding encoding and preprocessing, dynamic graph cluster and interest weight and prediction.}
    \label{fig:model framework}
   % \vspace{-0.5cm}
\end{figure*}

%-------------------------------------------------------------------------
%\vspace{-0.3cm}
\subsection{Multi-Interest recommendation models }
%\vspace{-0.3cm}
 Single user embedding methods capture overall user interests and fail to capture the diverse preferences of users in different contexts. Therefore, multi-interest recommendation methods have show their abilities to model the diversity of a user preference to improve the performance of recommendation systems, which are classified into two categories: split by cluster and split by Graph Neural Networks (GNNs). The former clusters a user interaction sequence according to the item embeddings 
 \cite{pin}, \cite{RimiRec} \cite{mcprn} \cite{int} or labels \cite{mip}, \cite{comirec} \cite{mind} \cite{FEMIRecr} and then obtains a single interest embedding for each cluster. These methods effectively model a user's multi-interests, while they often heavily depend on the initial distribution of item features, ignore the real relations between items in interaction sequences, and the clustering results are constant. 
  The latter uses user history behavior sequences to construct multiple interest graphs \cite{surge}, \cite{taiwan} and learns interest embedding for each sub-graph through GNNs \cite{big}, \cite{graph1}, \cite{graph2}. However, these methods only generate an edge between adjacent items in the sequence to construct user-interest graphs. Therefore,  these graphs only reflect adjacent relations between items in a sequence, but ignore the similarities among non-adjacent items. 
 %MCPRN \cite{mcprn}, IntNet \cite{int}, MIND \cite{mind} and FEMIRec \cite{FEMIRecr} built a new multi-layer framework as a multi-interest module. CMGAN \cite{graph1}, SGCMF \cite{graph2} use GNN to obtain users' multi-interest representations. MIP \cite{mip} and ComiRec \cite{comirec} design multi-interest module by using general clustering method.
 
%MCPRN\cite{mcprn}, IntNet\cite{int}, MIND\cite{mind}, and FEMIRec\cite{FEMIRecr} construct multi-layer interest extraction modules to non-linearly map multiple interest vectors from users' historical interactions, thereby decomposing users' preferences in different domains or contexts. For example, MIND utilizes the dynamic routing mechanism in capsule networks, treating users' historical behaviors as low-level capsules and adaptively clustering them into multiple high-level interest capsules, each capturing a specific interest. FEMIRec builds upon this by constructing a multi-layer neural network structure to further capture the hierarchical relationships and fine-grained differences between interests, thereby enhancing matching accuracy during the recall phase.

%On the other hand, CMGAN\cite{graph1} and SGCMF\cite{graph2} employ Graph Neural Networks (GNNs) to model the interaction graph between users and items, effectively extracting users' multi-interest representations through multi-layer message passing and neighborhood aggregation. These methods not only capture local collaborative information but also mine long-distance associations through the global graph structure, comprehensively reflecting users' interest distributions across different semantics and categories. MIP \cite{mip} and ComiRec\cite{comirec} utilize general clustering methods to partition users' historical behaviors into several groups, where similar behaviors within each group collectively form an interest cluster. They extract representative vectors from these clusters and assign personalized weights to each interest through learning methods, enabling precise matching during candidate recall based on users' actual preferences for each interest.
Although these aforementioned methods have achieved promising performance, we argue that they fall short in comprehensively exploring the real relations between items in a user interaction sequences when modelling user multiple interests. MDGR overcomes this shortcoming by iteratively constructs and continuously optimizes the interest sub-graph by comprehensively adopting dynamic graph cluster to explore the item real relations in sequences.
%In contrast to these clustering-based methods, our approach initially clusters user behaviors using item embeddings to capture diverse interests across different contexts. We then iteratively update the embeddings based on the current cluster topology, resulting in adaptive and personalized interest representations. This ensures precise candidate retrieval during the matching phase and effectively addresses the challenge of dynamically changing user interests.
%\vspace{-0.5cm}
\section{METHODOLOGY}
%\vspace{-0.3cm}
\subsection{Overall Architecture}
%\vspace{-0.3cm}
%为了解决什么问题  我们提出了什么方法 方法的组成部分  达到了什么目的 （not only）
%这部分能够干什么 简单句 直接 不用 加标题 作为正文的一部分 一部分两句话左右
%时序演化  时序依赖  
%合并
%To build a multi-interest extraction module 
%Item embedding encoding,Multi-Interest UserRepresentation,Interest Weight Module and Prediction.The overall structure of the proposed model is illustrated in Fig-ure 2.The item embedding encoding learns dense project features by combining embeddings of the project’s time and location information. The multi-interest user representation combines multi-head attention and dynamic graph clustering to construct an interest identification network. This con- structs multiple interest graphs, propagates information on these graphs to update node information, and recursively en- riches node features to improve identification accuracy. The interest weight module considers user preferences to assign different weights to different interests. After obtaining the user’s multi-interest representation and interest weights, the model predicts the next project most likely to interact with the user based on the correlation between an item and a interest for each interest


%In order to build a multi-interest model to solve the recommendation problem, 
We propose a Multi-interest embedding model with Dynamic Graph cluster for sequential Recommendation (MDGR) as shown in Figure \ref{fig:model framework}, which mainly consists of three components: 1) Item embedding encoding and  preprocessing, which generates item embeddings from the item
IDs, positions and timestamps and preprocess them by multi-head attention with Mask M. 2) Dynamic graph clustering, which firstly constructs the multiple interest sub-graph by comprehensively clustering the item embeddings and updates item representations by multi-head clustering attention mechanism
to extract the real correlations between items from sub-graph, and then
iteratively reconstruct interest sub-graph with the updated item embeddings to continuously learn user multiple interest representations. 3) Interest weight module and item prediction, which introduces weights for each interest
embedding and combines them with temporal information into a unified
representation to
predict the next item a user may be interested in.


%\vspace{-0.5cm}
\subsection{Problem Definition}
%\vspace{-0.3cm}
Let $\mathcal{U}=\{u_1,u_2,...,u_N\}$ and $\mathcal{I}=\{i_1,i_2,...,i_M\}$ represent the set of \emph{N} users and \emph{M} items, respectively. For each user $\emph{u}\in\mathcal{U}$, his/her interaction sequence is denoted as \( V^u=(V_1^u,V_2^u,\dots,V_{|V^u|}^u) \), where $V^u_i\in\mathcal{I}$ and the corresponding timestamps is \( T^u=(t_1^u,t_2^u,\dots,t_{|V^u|}^u) \). The purpose of our model is to predict the next item users may interact with at time $\emph{t}$ by modeling the users' interaction sequences. In general, sequential recommendation limits the maximum length of $V^u$ to $\emph{l}$. When it is greater than  \(l\), we take the most recent \(l\) items to  predict. 
%\fontsize{12}



%Let \( I \) and \( U \) represent the item set and user set in the dataset, respectively. For each user \( u \), his/her interaction sequence is denoted as \( V^u=(V_1^u,V_2^u,\dots,V_{|V^u|}^u) \).The corresponding timestamps are denoted as \( T^u=(t_1^u,t_2^u,\dots,t_{|S^u|}^u) \). The goal is to learn a set of embeddings for each user \( u \), denoted as \( Z_i^u \in \mathbb{R}^d \, (i=1\dots k) \), and their corresponding weights \( \omega_i \).
%\vspace{-0.4cm}
\subsection{Item embedding encoding and  preprocessing }
%\vspace{-0.3cm}
In the context of recommendation systems, when only the unique identifier of an item is known ids (one-hot encoded as \(v_i\)), we use the embedding layer to transform the unique ids of the item into a low-dimensional feature vector and to learn the dense features \(p_i\) of the item,
\begingroup
\setlength{\abovedisplayskip}{5pt} % 调整公式前的距离
\setlength{\belowdisplayskip}{5pt} % 调整公式后的距离
 \begin{equation}
        p_i= \sum_{(i=1)}^d W_{emb}^{(i) } v_i
    \end{equation}
\endgroup
%\vspace{-0.65cm}

\(W_{emb}^{(i) }\) are the trainable matrices. Furthermore, the item interaction order and timestamps within a sequence can also reflect a user's interests. It is reasonable to add positional and time information to  item embeddings, %The positional encodings and time encodings  have the same dimension \(d_{model}\) as the embeddings. 
%\vspace{-0.1cm}
\begingroup
\setlength{\abovedisplayskip}{5pt} % 调整公式前的距离
\setlength{\belowdisplayskip}{5pt} % 调整公式后的距离
    \begin{equation}
        e_i=[p_i ||\tau(t_i )||\rho(i)]
         \end{equation}
\endgroup
%\vspace{-0.08cm}
\(||\) means concatenation operation. There are many choices of encodings. Sin and cos functions allow the model to easily learn by relative positions and their periodicity, which can efficiently handle longer sequences. Therefore, we select them to compute the the position (\(\rho\)) and time (\(\tau\))\cite{position}, which are defined as follows
\begin{equation}
    %\scriptsize
	\begin{aligned}
	\tau_{2i}(t_j) &\!= \sin\left(\frac{t_j}{10000^{\frac{2i}{d_m}}} \right)\!\!&
	\tau_{2i+1}(t_j) &\!= \cos\left(\frac{t_j}{10000^{\frac{2i}{d_m}}} \right)\!\!\\
	\rho_{2i}(j) &\!= \sin\left(\frac{j}{10000^{\frac{2i}{d_m}}} \right)\!\! &
	\rho_{2i+1}(j) &\!= \cos\left(\frac{j}{10000^{\frac{2i}{d_m}}} \right)
	\end{aligned}
\end{equation}
where \(d_{m}=1\) refers to the dimensionality of the model. The unit of timestamps is day. To effectively represent all the items with which users interact, we adopt an attention mechanism to better obtain the representations of items.

%\subsection{User Representation Preprocessing}
%句子之间的连贯性 
%写出目的 然后给出划分就行


%\vspace{-0.4cm}
\begin{equation}
    \begin{aligned}
        S_{i,j}&=\frac{(W_q e_i+b_q )^{\top} \cdot (W_k e_j+b_k)}{\sqrt{d_{m}}} \\
        \alpha_{i,j} &= \text{softmax}_j(S_{i,j})
    \end{aligned}
\end{equation}
The attention score \(\alpha\) is constrained by a mask matrix M. For any i and j, the element \(m_{i,j}\) in the mask matrix M is equal to 1, which indicates that the model does not ignore any relations between positions when calculating the attention score. Consequently, all positions can pay attention to each other. In order to further enrich the multi-dimensional information of items, we use multi-head attention mechanism to obtain more comprehensive item representation by processing the information of multiple subspaces in parallel. Each attention head \(\varphi_i^h\) is represented as,
%\vspace{-0.1cm}
\begingroup
\setlength{\abovedisplayskip}{2pt} % 调整公式前的距离
\setlength{\belowdisplayskip}{1pt} % 调整公式后的距离
\begin{equation}
 \varphi_i^h=\sum_j a_{i,j}^h m_{i,j}^h e_j^h 
\end{equation}
\endgroup
In order to process the aggregated vector from all attention heads, a dropout layer is applied to prevent overfitting. Subsequently, the aggregated vectors are further processed by a feed-forward neural network (FFN) as follow
 \begin{equation}
\varphi_i=FFN(Dropout([\varphi_i^1;....\varphi_i^H]
\end{equation}
where \(i=1,....,l\).  \(\varphi_i \in R^d\) is the embedding vector, \(d\) represents the dimension of the embedding vector,  \(FFN\) consists of two fully connected layers, %After the first layer, we adopted a hyperbolic tangent activation function (tanh) to perform nonlinear transformation.The second layer is then a linear transformation layer.This design allows the network to capture more complex features, 
which enhances the expressiveness and flexibility of the item. 
%These user interaction items \(\varphi_i;....\varphi_l\) constitute the initial representation of the user.
%\vspace{-0.5cm}
\subsection{Dynamic Graph Clustering} 
%\vspace{-0.2cm}
Clustering is an unsupervised learning method, which is used to divide data points into groups according to similarities between them and is widely applied in user representation learning. Previous methods analyze raw interaction sequences with static item embeddings which lead to the clustering result being constant, and ignore sequential information in user behavior sequences. Our proposed dynamic graph clustering method can extract the real correlations between adjacent and non-adjacent items by iteratively reconstructing and continuously optimizing interest
sub-graphs to learn user multiple interest representations.

%Clustering is a unsupervised learning method, which is used to divide data points into groups according to similarities between them and is widely applied in user representation learning. Previous methods analyze raw interaction sequences with static item embeddings which lead to the clustering result is constant and ignore sequential information in user behavior sequences. Our proposed dynamic graph clustering method firstly constructs the multiple interest sub-graph by comprehensively clustering the item embeddings and updateS item representations by multi-head clustering attention mechanism to extract the real correlations between items in sub-graph. Then it iteratively reconstructs interest sub-graph according to the updated item embeddings and continuously learns user multiple interest representations from sub-graph. Finally, we select a representative item from each sub-graph as its interest representation. 
%It is composed of three components: Clustering, Interest graph propagation and Multi-Interest Representation.

%\vspace{-0.6cm}
\subsubsection{Constructing Sub-graph by Clustering.} 
%\vspace{-0.3cm}
In order to obtain and distinguish multiple interest representations of users from item sequences, we can convert loose item sequences into interest sub-graph graph by clustering, each of which represents an interest of the user.

\textbf{Clustering.} To avoid the propagation of unnecessary information from items belonging to different interests to decrease the performance of recommendation, we use  clustering methods to process user interaction sequence so that each item belongs to only one cluster \(group_i\).
%描述聚类We use clustering methods to process the sequences
The cluster may employ a variety of algorithms, including Ward, K-Means, Birch, and we set the  cluster number to be  \(k\).

%\subsubsection{Interest graph propagation.} We set the number of clusters to be k
%Construct an interest graph for each clustering group, thereby forming \( k \) subgraphs. For each subgraph, the multi-head clustering attention mechanism updates the node representations in a sparse structure.

\textbf{Interest Sub-Graph Construction.} We attempt to construct an undirected interest sub-graph \(G= \left\{\varphi,\mathcal{E} ,A \right\}\) after clustering the user interaction items, where \(\varphi\) is the set of nodes in the graph consisting of items interacted by one user, \(\mathcal{E}\) is the set of edges representing the correlations between items, and \(A\) represents the adjacency matrix corresponding to the graph. We learn the edge weights in the adjacency matrix \(A\) through metric learning. Specifically, the edge weights are calculated using the following weighted cosine similarity, 
\begingroup
\setlength{\abovedisplayskip}{4pt} % 调整公式前的距离
\setlength{\belowdisplayskip}{4pt} % 调整公式后的距离
\begin{equation}
L_{i,j} = \cos(W_i \odot \varphi_i, W_j \odot \varphi_j)
\end{equation}
\endgroup
where \( \odot \) represents the Hadamard product, \( W_i \) and  \( W_j \) are trainable weight vectors used to adjust the dimensions of the embedding vectors. To enhance expressiveness, we compute \( \delta \) different similarity measures by iteratively \(\delta \) times, where each matrix captures the relations between items from a unique perspective. The final similarity score is then obtained by averaging these matrices,
\begingroup
\setlength{\abovedisplayskip}{2pt} % 调整公式前的距离
\setlength{\belowdisplayskip}{2pt} % 调整公式后的距离
 \begin{equation}
    L_{i,j} =\frac{1}{k} \sum_{k=1}^{ \delta } L_{i,j}^k 
\end{equation}
\endgroup
where \( L_{i,j}^k \) represents the similarity measurement between items \( i \) , \( j \) on the \( k \)-th head.


\textbf{Graph Sparsification.}
Typically, the elements of the adjacency matrix are non-negative, while cosine values \( L_{i,j} \) range from -1 to 1. Direct normalization may fail to ensure graph sparsity and can yield a fully connected adjacency matrix. This increases computational complexity, introduces noise and cannot focus on the most relevant aspects of the graph. %To mitigate this and emphasize important edges, a relative ranking strategy is applied.  
To emphasize important edges with the most vital connection and keep the graph’s sparsity
distribution, a relative ranking strategy is applied. The element \( A_{i,j} \) in the metric matrix \( A \) is set to 1 if \( L_{i,j} \) is greater than a certain threshold, otherwise \( A_{i,j} \)=0,
 \begingroup
\setlength{\abovedisplayskip}{2pt} % 调整公式前的距离
\setlength{\belowdisplayskip}{2pt} % 调整公式后的距离
 \begin{equation}
  A_{i,j} =
\begin{cases} 
1,  & \mbox{if } L_{i,j}\ge \text{TopValue}_{\gamma n^2}(L) \\
0,  & \mbox{otherwise }
\end{cases}\\
\end{equation}
\endgroup
%a说明
where \( \text{TopValue}_{\gamma n^2}(L) \) represents the \(\gamma n^2th\) largest value in matrix \(L\) after sorting, where \(\gamma\) controls the overall sparsity of the graph, and 
\( n \) is the number of nodes. 
Compared to the absolute threshold strategy and the relative ranking strategy of the node neighborhood, it not only keeps sparse distribution of graphs when hyperparameters are improperly set, but also makes each node of the generated graph have a different degree, allowing the downstream GCN to fully utilize the dense or sparse graph structure information.
%It is different from the absolute threshold strategy of the entire graph [5] and the relative ranking strategy of the node neighborhood [4, 19]. The former sets an absolute threshold to remove smaller elements in the adjacency matrix. When the hyperparameters are set improperly, as the embedding is continuously updated,the metric value distribution will also change, and it may not be possible to generate a graph or generate a complete graph. The latter returns the indices of a fixed number of maximum values of each row in the adjacency matrix, which will make each node of the generated graph have the same degree. Forcing a uniform sparse distribution will make the downstream GCN unable to fully utilize the graph’s dense or sparse structure information.

%This method differs from using an absolute threshold for the entire graph or a relative ranking strategy for node neighborhoods. The absolute threshold approach may fail to generate a graph due to changes in embeddings, while the relative ranking strategy forces each node to have the same degree
%\vspace{-0.6cm}
\subsubsection{Information Propagating and Aggregating in Sub-graph and Node Updating.}
%\vspace{-0.3cm}
For each node in the sub-graph, we apply a cluster-aware attention mechanism to extract the correlations between items in the same cluster. Compared to general attention mechanisms, it can not only capture the complex relations between nodes in the graph, but also flexibly capture the information of direct neighbor and \( k \)-hop neighbors of node \( i \). We obtain the cluster vector  \( \varphi_{i_c} \) by computing the average value of normalized adjacent order matrix between node \( i \) and its \( k \)-hop neighbors. We calculate the attention score \(\alpha_i\) using node \( \varphi_i \) and the cluster information \( \varphi_{i_c} \),
 \begingroup
\setlength{\abovedisplayskip}{5pt} % 调整公式前的距离
\setlength{\belowdisplayskip}{5pt} % 调整公式后的距离
\begin{equation}
 \alpha_i=Attention (W_c \varphi_i ||\varphi_{i_c} ||W_c \varphi_i\odot \varphi_{i_c})
\end{equation}
\endgroup
\( W_c \) is the transformation matrix, \( || \) is the concatenation operator, and \( \odot \) is the Hadamard product. To understand changes in user interest between different target items, it is necessary to consider the relevance between the source node \( \varphi_j \) and the target item embedding \( \varphi_t \). We adjust the weight to preserve relevant information,
\begingroup
\setlength{\abovedisplayskip}{4pt} % 调整公式前的距离
\setlength{\belowdisplayskip}{4pt} % 调整公式后的距离
\begin{equation}
\beta_j=Attention (W_q \varphi_j ||\varphi_t|| W_q \varphi_j\odot \varphi_t )
\end{equation}
\endgroup
\( W_q \) is the transformation matrix. Unlike traditional dot-product attention, here the \( \text{attention} \) is calculated by multi-layer perceptron (MLP), which uses multi-layer nonlinear transformation to capture the complex relations between nodes and is more flexible in comparison.

We follow the additive attention mechanism to simultaneously combine the the the the cluster and query scores. We calculate the updated weight of source node \( j \) to target node \( i \) and use softmax to normalize these weights. Thus, the attention coefficient \( E_{i,j} \) is derived as follows
\begingroup
\setlength{\abovedisplayskip}{4pt} % 调整公式前的距离
\setlength{\belowdisplayskip}{4pt} % 调整公式后的距离
\begin{equation}
 E_{i,j} = \text{softmax}_j(\alpha_i+\beta_j)
\end{equation}
\endgroup
%In most models, people choose to use GCN to complete the update of nodes on the graph. But here, through experiments, we find that when we use GCN, one epoch consumes about twice as much attention as when we use multiple heads, and it takes about 16 epochs to get results. In our case, we use multiple attention to update the node representation, and only 4 epochs are needed to achieve the best results.
Therefore, we introduce multiple independent attention heads to update the node representations. We compute the updated representation  \(\varphi_i^{\prime}\) of node \( i \) through using the attention coefficient \( E_{i,j} \).
\begingroup
\setlength{\abovedisplayskip}{2pt} % 调整公式前的距离
\setlength{\belowdisplayskip}{2pt} % 调整公式后的距离
\begin{equation}
 \varphi_i^{\prime} = \frac{1}{H} \sum_{h=1}^{H} \left( E_{i,j} \cdot \varphi_i\right)^h
\end{equation}
\endgroup

 Then we obtain the updated node representations \( \varphi_i^{\prime} \) (\(i=1,....,l\) ). We iteratively reconstruct the interest sub-graph according to the latest item representations,   propagate information and update node embeddings for \( m \) times. Finally, we perform the clustering to obtain the results of dynamic graph clustering after the \( m \)-th iteration.This iterative process allows the model to progressively refine the representation of user interests, ensuring that the evolving preferences are accurately captured. Additionally, by updating the sub-graph at each iteration, the model can better adapt to new information, leading to more accurate recommendations.


%\vspace{-0.5cm}
\subsubsection{Multi-Interest Representations of Users} 
%\vspace{-0.2cm}
We obtain \( l \) vectors \( \varphi_1, \varphi_2, \ldots, \varphi_l \), each of which aggregates the item features \( e_i \), target item features, and cluster information. To obtain rich multi-interest representations, we select the last item \( \varphi _i^j \) in \( \text{group}_j \) as the query to obtain \( \varphi_{u_j} \), which represents as the user interest in \( \text{group}_j \). Therefore, the embedding of user interest \( j \) is set as \( z_j = \varphi_{u_j} \), and the user's multi-interest representation \( Z \in \mathbb{R}^{k \times d} \).
\begingroup
\setlength{\abovedisplayskip}{5pt} % 调整公式前的距离
\setlength{\belowdisplayskip}{2pt} % 调整公式后的距离
\begin{equation} 
Z=[z_1^T:z_2^T…;z_k^T]
\end{equation}
\endgroup
%\vspace{-0.8cm}
\subsection{Interest Weight and Prediction}
%\vspace{-0.3cm}
In recommendation systems, under the assumption of multiple interests, the user favors each interest unequally and each interest varies over time. By prioritizing these interests, we assign a weight to each interest. It is possible to generate recommendation candidates more effectively and improve the overall performance of the recommendation system.
%\vspace{-0.5cm}
\subsubsection{Interest Weight Model}
%\vspace{-0.3cm}
In general, a user pays higher interest to a cluster if the user interacts with many items belong to it and the interactive time of items in the cluster is closer. Therefor, we should assign higher weight for this interest cluster. In order to utilize both the number of items and their interactive time in that cluster, we calculate the weight for each interest component \( z_j \) using the cluster labels \( C_{\text{labels}} \) obtained from the dynamic graph clustering and the item time embedding \( \tau \). When the cluster tag of an item \( C_i \) matches the same \(group_j\), we retain \( z_j \) and its \( \tau_j \). For those items belonging to other clusters, we mask them to 0 to maintain the consistency of the input dimensions. The interactive timestamps of all items in the same cluster are concatenated with \( z_j \), and we use a two-layer feedforward network \( \text{FFN}\) to capture the weight \(\omega_j\),
\begingroup
\setlength{\abovedisplayskip}{6pt} % 调整公式前的距离
\setlength{\belowdisplayskip}{6pt} % 调整公式后的距离
\begin{equation}
\scalebox{0.95}{$\omega_j=FFN([z_j;1_{[C_1\in group_j ]} \cdot \tau_1;..;1_{[C_l\in group_j ]} \cdot \tau_l ])$}
\end{equation}
\endgroup
where \( FFN()\) consists of two fully connected layers that use the sigmoid function as the activation function between these two layers. The interests learned from in section 3.4 are all topics that the user is interested in. Therefore, all their attention scores should be positive. We use the Softplus function (a smooth version of ReLU) to normalize the weights to the range \([0,  +\infty ]\). 









%\vspace{-0.5cm}
\subsubsection{Item Prediction and Optimization}
%\vspace{-0.3cm}
Intuitively, we think a user may like an item if the item matches one of the user's interests (not all ). Essentially, this means that the item's embedding is close to one user interest embedding, rather than needing to match all interests.
Therefore, a user whether like an item depends on the maximum similarity score between all user interest embeddings and the item embedding. Furthermore, the weight of each interest should also be taken into account when considering the user's preference to different interests. Thus, the user's preference score \( y \) for an item \(p\) is calculated as follows,
\begingroup
\setlength{\abovedisplayskip}{3pt} % 调整公式前的距离
\setlength{\belowdisplayskip}{3pt} % 调整公式后的距离
\begin{equation}
y = \max \left\{ \omega_j \text{Linear}(z_j \cdot p) \right\}_{j=1}^k
\end{equation}
\endgroup
\( Z = [z_1^T; z_2^T; \ldots; z_k^T] \) represents the user multi-interest obtained from section 3.4, and \( \omega = [w_1^T; w_2^T; \ldots; w_k^T] \) represents the multi-interest weights obtained from section 3.5.1, \( P \) is the embedding vector of the unseen item.
We use the cross-entropy loss function to calculate the loss \(\mathcal{L} \) by combining the positive example labels \( I_+ \) and negative example labels \( I_- \), which adjusts the model parameters to obtain a high probability for the true target items. 
\begingroup
\setlength{\abovedisplayskip}{2pt} % 调整公式前的距离
\setlength{\belowdisplayskip}{2pt} % 调整公式后的距离
\begin{equation}
\scalebox{1.2}{$
\mathcal{L}=-\frac{\sum_{u \in U}\left(\sum_{p_{i} \in I_{+}^{u}} \log \left(y_{i}^{u}\right)+\sum_{p_{i} \in I_{-}^{u}} \log \left(1-y_{i}^{u}\right)\right)}{\sum_{u \in U}\left(\left|I_{+}^{u}\right|+\left|I_{-}^{u}\right|\right)}$}
\end{equation}
\endgroup
After obtaining the loss for each batch of training samples, the model is trained using the back propagation through time (BPTT) \cite{bptt} algorithm.




\begin{table}[ht]
\setlength{\abovecaptionskip}{0.1cm}
\centering
\tabcolsep=0.03\linewidth % 调整列间距

\renewcommand{\arraystretch}{1} % 调整行间距
\caption{Statistics of the three datasets}
\label{dateset}
\scalebox{1}{
\begin{tabular}{p{1.5cm}|r r r} % 为每列添加垂直线，并对数字列使用右对齐
\hline\hline
      \textbf{Datasets}          & \textbf{Amazon} & \textbf{MovieLens} & \textbf{Taobao} \\ \hline
\#Items        & 425,582         & 15,243             & 823,971        \\ 
\#Users        & 67,165          & 137,212            & 363,171       \\
\#Interactions      & 6,716,500       & 13,721,200         & 36,317,100         \\ 
\#Training     & 57.165          & 127,212            & 343,171        \\ 
\#Test         & 5,000           & 5,000              & 10,000         \\ 
\#Validation    & 5,000           & 5,000              & 10,000         \\ \hline

\end{tabular}
}
%\vspace{-0.3cm}
\end{table}

\section{EXPERIMENTS}
%\vspace{-0.3cm}
This section first introduces three real-world datasets which are widely used to conduct experiments in recommender systems.
Next, it presents the evaluation metrics for measuring the prediction accuracy and compares the performance of the proposed method with other methods. Finally, a thorough analysis of the contributions of several components, sensitivities to change in model parameters and convergence speed for MDGR.
%\vspace{-0.4cm}
\subsection{Experimental Settings}
%\vspace{-0.3cm}
\subsubsection{Datasets}
%\vspace{-0.4cm}

We conduct experiments on three challenging public datasets\footnote{The code is available at: \url{https://anonymous.4open.science/r/MDGR/}}. We adopt a 10-core setting and filter out rare items that appear less than 10 times in the entire dataset, as well as inactive users who interact with fewer than 100 items. We split each user's interaction history into non-overlapping sequences of 100 items, and use the first 50 items to learn the user's embeddings and the last 50 items as positive samples for ranking. For each sequence, an additional 50 negative samples are randomly selected from the items the user has not interacted with. The statistics of the three datasets are shown in Table \ref{dateset}.
\begin{table*} % 使用table*来实现双栏排版
\caption{Performance comparisons between MDGR and all baselines in terms of AUC and Recall@50 . The best result in each column is boldfaced, and the underline indicates the second best results. The ‘Improve.’ indicates the improvements that MDGR achieves over the best baselines.}
\label{result}
\centering
\tabcolsep=0.01\linewidth % 调整列间距
\renewcommand{\arraystretch}{1} % 调整行间距
%\small % 控制字体大小
\scalebox{1}{
\begin{tabular}{p{2.5cm}|p{1.5cm}|p{1.2cm}|p{1.2cm}p{1.2cm}|p{1.2cm}p{1.2cm}|p{1.2cm}p{1.2cm}} % 调整列宽
\hline \hline 
\multirow{2}{*}{ \textbf{ \centering Category}} & \multirow{2}{*}{\centering \textbf{Methods}} & \multirow{2}{*}{\textbf{Params.}} & \multicolumn{2}{c|}{\textbf{Amazon}} & \multicolumn{2}{c|}{\textbf{Taobao}} & \multicolumn{2}{c}{\textbf{MovieLens}} \\
\cline{4-9} 
& & & AUC & R@50 & AUC & R@50 & AUC & R@50 \\
\hline 
\multirow[c]{3}{*}{\centering \parbox{2cm}{\centering \makecell{Sequential \\ Recommendation}} }
& \parbox[t]{2.5cm}{GRU4Rec} & 66338 & 68.62 & 63.44 & 81.55 & 74.48 & \underline{96.13} & 90.31 \\
& \parbox[t]{2.5cm}{BERT4Rec} & 50242 & 68.11 & 63.15 & 81.47 & 74.52 & 95.95 & 90.11 \\
& \parbox[t]{2.5cm}{TiSASRec} & 67586 & 72.11 & 66.67 & 81.46 & 74.43 & 96.02 & 90.16 \\
& \parbox[t]{2.5cm}{DCRec} & 76952 & 76.08 & 63.23 & 83.21 & 79.15 & 93.52 & 91.57 \\
& \parbox[t]{2.5cm}{MAERec} & 78633 & 78.26 & 71.52 & 84.26 & 77.49 & 92.17 & 90.45 \\ 
\hline
\multirow{2}{*}{\centering \makecell{Multi-interest \\ (GCN)}}
& \parbox[t]{2.5cm}{Surge} & 71339 & 79.88 & \underline{79.06}  & 86.64 & 84.77 & 89.06 & 89.71 \\
& \parbox[t]{2.5cm}{BIGCF} & 50087 & 69.52 & 65.65 & 73.84 & 71.75  & 89.51 & 82.88  \\
\hline
\multirow{3}{*}{\centering \makecell{Multi-interest \\ (Cluster) }} 
& \parbox[t]{2.5cm}{PinText2} & 69634 & 55.83 & 54.13 & 71.58 & 66.88 & 88.27 & 81.68 \\
& \parbox[t]{2.5cm}{ComiRec} & 67586 & 71.72 & 67.36 & 70.92 & 65.61 & 95.25 & 90.65 \\
& \parbox[t]{2.5cm}{MIP} & 50824 & \underline{80.47} & 78.85 & \underline{88.49} & \underline{88.43} &  92.32 & \underline{92.97} \\
\hline 
\multirow{1}{*}{\parbox[c]{2cm}{\centering Ours}}
& \parbox[t]{2.5cm}{MDGR} & 49331 & \textbf{92.03} & \textbf{85.94} & \textbf{90.68} & \textbf{93.28} & \textbf{96.16} & \textbf{95.59} \\
\hline % 添加横线
\multirow{1}{*}{\parbox[c]{2cm}{\centering Improve}}
& \parbox[t]{2.5cm}{} \centering/& \centering/ & {14.36\%} & {8.70\%} & {2.47\%} & {5.48\%} & {0.3\%} & {2.82\%} \\
\hline % 添加横线
\end{tabular}
}%\vspace{-0.2cm}
\end{table*}
\subsubsection{Baselines} 
%\vspace{-0.4cm}
To evaluate the performance of MDGR, we compared it with several well-known baselines, which are classified into  sequential models and  multi-interest models. The sequential models are composed of GRU4Rec \cite{Gru4Rec}, BERT4Rec \cite{bert} TiSASRec \cite{Tisa} DCRec\cite{seq2} and MAERec\cite{MAERec}, which present the user's dynamic interest as an overall representations according to exploiting their historical behaviors. The multi-interest models consist of PinText2 \cite{pin}, ComiRec 
\cite{comirec}, Surge \cite{surge}, MIP \cite{mip} and BIGCF \cite{big}, which represent the user's interest as multiple embeddings by using graph convolutional network (Surge, BIGCF) and clustering (PinText2, ComiRec, MIP). 
%$\bullet$\textbf{Amazon-book%\footnote{\url{https://www.kaggle.com/datasets/mohamedbakhet/amazon-books-reviews }}.
%}
%This dataset consists of product reviews and metadata. We use the books category of the Amazon dataset, which contains the book reviews from the Amazon website during May 1996 - July 2014.

%$\bullet$ \textbf{Taobao\footnote{\url{https://tianchi.aliyun.com/dataset/131940}}.} This dataset is widely used for recommendation research, which is collected from the largest e-commerce platform in China. We use the click data from November 25 to December 3, 2017.

%$\bullet$ \textbf{MovieLens-20M\footnote{\url{https://grouplens.org/datasets/movielens/20m/}}.} The dataset describes ratings and free-text tagging activities from MovieLens for movie recommendation service. It was created by 138493 users between January 09, 1995 and March 31, 2015. This dataset was generated on October 17, 2016.
%\vspace{-0.6cm}

\subsubsection{Metrics and Parameter Settings } 
%\vspace{-0.4cm}
\textbf{Metrics.} The models are evaluated in the retrieval scenario, where the recommendation system needs to recommend a batch of items to a user. We use two commonly used evaluation metrics Recall and AUC in our experiments. Recall describes what proportion of user-item rating records are included in the final recommendation list. AUC signifies the probability that the positive item sample’s score is higher than the negative item sample’s score, which reflects the model's ability to distinguish positive and negative samples.

%\vspace{-0.4cm}



%\begin{equation}
  %  \text { Recall@N }=\frac{1}{|\mathcal{U}|} \sum_{u \in \mathcal{U}} \frac{\left|\hat{I}_{u, N} \cap I_{u}\right|}{\left|I_{u}\right|}
%\end{equation}
%where \(\hat{I}_{u, N}\) denotes the set of top-N recommended items for user u and \(I_u\) is the set of testing items for user u.


%\vspace{-0.6cm}


%$\bullet$ \textbf{GRU4Rec \cite{Gru4Rec}.} It extents RNN to GRU to model the entire session and introduces a ranking loss function to solve the problem of short session.


%$\bullet$ \textbf{BERT4Rec \cite{bert}.} It employs the deep bidirectional self-attention to model user behavior sequences and formulate an overall representation of user interests for making recommendations.

%$\bullet$ \textbf{TiSASRec \cite{Tisa}.} It models both the absolute positions of items as well as the time intervals between them in a sequence to explore the influence of different time intervals on next item prediction

%$\bullet$ \textbf{Surge \cite{surge}.} It constructs the interest graph from user interaction behaviors and uses dynamic-pooling for filtering and reserving activated core preferences for recommendation.


%$\bullet$ \textbf{BIGCF \cite{big}.} It proposes the concepts of individual intent and collective intent to model the user interest and implements the recommendation process through bilateral intent-guided graph reconstruction resampling.

%$\bullet$ \textbf{PinText2 \cite{pin}.} It uses hierarchical clustering method to cluster the clicked items of user behavior history into k categories to represent the user's interests.

%$\bullet$ \textbf{ComiRec \cite{comirec}.} It is a multi-interest model which captures multiple interest by self-attention and dynamic routing from user behavior sequences and then feeds items into an aggregation module to obtain the overall recommendation

%$\bullet$ \textbf{MIP \cite{mip}.} It not only produces multi-interest for users by using the user’s sequential engagement more effectively but also adds a set of weights to each embedding so that the candidates can be retrieved from each interest proportionally.



\textbf{Parameter Settings.} The model is implemented using the Pytorch framework. We initialize the model parameters by using the default Kaiming initializer and optimize models with the Adam optimizer. The embedding size is set 32. The learning rate is set to 0.001 and the batch size is fixed at 128. We set the number of interests to 8 and the number of cycles of Dynamic Graph Clustering to 9 in Amazon and 5 in Taobao and MovieLens, which leads to the best results in every training-testing process. We tune hyper-parameters using the validation set, and terminate training if validation performance doesn’t improve for 10 epochs.
%\vspace{-0.3cm}



%\vspace{-0.2cm}
\subsection{Experimental Results} 
%\vspace{-0.2cm}
To demonstrate the validity of MDGR, we compare it with ten representative baselines on three datasets in term of two metrics. Table \ref{result} shows the performance of MDGR and all baselines.
%the last line is the improvements of MDGR relative to the best baseline.
MDGR achieves the best performance across all metrics on three datasets, which strongly supports the effectiveness of it. Specifically, the MDGR achieves average improvements over the strongest baselines { \textit {w.r.t.}} AUC by 14.36\%, 2.47\%, 0.3\%, Recall by 8.70\%, 5.48\%, 2.82\% on Amazon, Taobao and MovieLens, respectively. By propagating information between non-adjacent nodes in the same cluster and eliminating the propagation between irrelevant nodes, it is able to capture the useful correlations and weaken the noise during the process of propagating information, while most other baselines are not capable of fully and accurately exploring them.  MDGR performance is significantly improved on Amazon. This is because Amazon contains many different categories of items and more user behavior choices. Our model can better deal with the relations between these diverse behaviors.

%Specifically, it performs worse than XXX.
Notably, Surge and MIP are inferior to MDGR but superior to other baselines in most cases. This may be because MDGR considers the relations of non-adjacent homogeneous vectors and propagates information on the constructed interest sub-graph, while Surge and MIP ignore it. Surge and MIP achieve better performance than other baselines, especially on Amazon and Taobao. The possible reason is that it not only has a stronger ability to capture the user's interests but also recommends items by matching with each interest embedding. The results of multi-interest models (PinText2, ComiRec, and BIGCF) are compare to sequential models (GRU4Rec, BERT4Rec, TiSASRec, DCRec and MAERec) on Amazon datasets, while being inferior them on Taobao and MovieLens datasets. We attribute it to the fact that the multi-interest representations of users can better adapt to the diversity of Amazon datasets.




%Therefore, we can see that our proposed XXX performs better and better on larger data sets. The relative performance gains of XXX compared to the best baselines on Amazon and Taobao are in the range of 8.99\%——14.36\% and 2.47\%——5.48\% respectively, which shows that our XXX effectively captures complex large-scale real-life scenes Multi-granularity of user interests.
%\vspace{-0.6cm}



\subsection{Ablation Study} 
%\vspace{-0.4cm}
To study the contributions of different components, we further compare our full model with different variants on three datasets ( that is to say, item embedding module, dynamic graph clustering module and interest weight module are included or excluded in MDGR ). 
%We perform an ablation study on the design choices in MDGR to demonstrate their effectiveness. 
%Specifically, these factors include the item embedding module, dynamic graph clustering module, and interest weight module.
Specifically, MDGR-DGC represents using the general clustering method (ward) to replace dynamic graph clustering. MDGR-W represents removing the interest weight module and the weight of each interest is equal. MDGR-PT, MDGR-P and MDGR-T represent removing position and timestamps, position, timestamps to encode the item embeddings, respectively. 

Table\ref{ab} shows the experimental results. It shows that MDGR outperforms all variants on three datasets in term of all metrics, which validates 
the superiority of introducing item embedding module, dynamic graph clustering module and interest weight module. We observe that MDGR achieves
better performance than MDGR-T. We attribute the improvement to comprehensively explore temporal information by encoding the item timestamps. Meanwhile, the performance of MDGR-T is inferior to that of MDGR-P, which further demonstrates ignoring item timestamps will weaken the model performance and the temporal influence is larger than that of positions to improve the recommendation performance. Furthermore, MDGR-W, MDGR-DGC and MDGR-PT perform worse than MDGR, so we can conclude that all components are beneficial to capture the user muti-interests for improving recommendation performance. It is worth noting that the results of MDGR-W and MDGR-DGC significantly are worse than  those of MDGR. That proves we can get better multi-interest embeddings by using dynamic graph cluster and interest weights. In summary, MDGR consistently achieves the best performance in most cases. This illustrates that comprehensively modeling dynamic graph cluster, interest weight and item encoding are  important for better recommendation. %Therefore, we should capture them when modeling the user multi-interests. particularly

%The adopted GCN approaches were Surge and BIGCF; the adopted clustering approaches were PinText2,comirec and MIP.



%using dynamic graph clustering helps to extract multi-interests of users from related items and the interest weight method helps that each interest can fully consider the impact of time and category factors on user interests. Furthermore, MDGR-PT, MDGR-P and MDGR-T all perform worse than MDGR,so we can conclude that all components are beneficial to capture the user's multiple interest and prove that current user interests may be more relevant to recent interactions. 





%In the item embedding module, we add position and time information to explore that current user interests may be more relevant to recent interactions. We conduct experiments in the following four situations: 1) item embedding(MDGR-I) 2) item + time(MDGR-TI) 3) item + position(MDGR-PI) 4) item + time + position(MDGR-PTI). As shown in Table 3, we can see that the model effect is 4>2>1>3, which confirms the impact of time factors on user interests and that the combination of time and location information can more accurately extract user interests.

%To evaluate the impact of extracting users' multiple interests through dynamic graph clustering and adding interest weights for prediction on multi-interest modeling, we conducted the following two experiments: 1) Use the general clustering method (ward) to replace dynamic graph clustering(MDGR-DGC) 2) Remove In the interest weight part, the weight of each interest is equal(MDGR-W).
%The results are shown in Table \ref{ab}.%要不要写 情况1 小于完整模型 
%We can observe that using dynamic graph clustering for interest extraction can help filter out irrelevant noise, allowing the model to focus on extracting interesting parts from related items. %要不要写比较
%Adding weights to each interest during prediction can fully consider the impact of time and category factors on user interests.

\begin{table*}
\caption{Performance of compared with different variants in terms of AUC and Recall (“-” indicates MDGR does not consider the setting of this part).}
\label{ab}
\centering
\tabcolsep=0.015\linewidth % 适当调整列间距
\renewcommand{\arraystretch}{1.2} % 调整行间距
\scalebox{1}{
\begin{tabular}{p{0.1\linewidth}|p{0.115\linewidth}|p{0.09\linewidth}|c|c|c|c|c|c}
\hline \hline
\multirow{2}{*}{\centering Classification} & \multirow{2}{*}{\centering Variants} & \multirow{2}{*}{\centering Ablation} & \multicolumn{2}{c|}{Amazon} & \multicolumn{2}{c|}{Taobao} & \multicolumn{2}{c}{MovieLens} \\
\cline{4-9} 
& & & AUC & R@50 & AUC & R@50 & AUC & R@50 \\
\hline 
Weight & MDGR-W & \centering -Weight & 84.16 & 77.58 & 81.53 & 85.31 & 88.49 & 87.61 \\
\hline 
Cluster & MDGR-DGC & \centering -DGC & 80.47 & 78.85 & 88.49 & 88.43 & 92.32 & 92.97 \\
\hline 
\multirow{3}{*}{\makecell[l]{Item \\ Embedding}} & MDGR-PT & \centering -PT & 87.35 & 84.75 & 86.64 & 87.22 & 92.61 & 92.21 \\
 & MDGR-P & \centering -P & 89.47 & 87.94 & 88.75 & 88.56 & 93.96 & 93.19 \\
 & MDGR-T & \centering -T & 86.19 & 84.28 & 84.54 & 83.71 & 90.01 & 89.31 \\
\hline
 Full& MDGR &  \centering ALL & \textbf{92.03} & \textbf{85.94} &\textbf {90.68} & \textbf{93.28} & \textbf{96.16} &\textbf {95.59} \\
\hline 
\end{tabular}
}
%\vspace{-0.3cm}
\end{table*}

%\vspace{-0.5cm}
\subsection{Parameter Sensitivity}
%\vspace{-0.2cm}
To explore the effect of hyperparameter settings on MDGR, we study how two hyperparameters (cluster number and the number of iterations ) to affect the performance of MDGR.

\textbf{Impact of Cluster Number.} Choosing the appropriate number of clusters is an important step for multi-interest user representation. If the number of cluster is too large, the computational cost will be too high and the average information learned by each cluster will be reduced. But it is difficult to distinguish different interests if the number of clusters is too small. 
We search for the best-performing result in the range of \{1, 5, 8,10\}. Figure \ref{fig:cluster} depicts the experimental performance on AUC. According to Figure \ref{fig:cluster}, it can be seen that as the number of clusters increases, the effects on the three datasets first increase and then decrease, and the best effect is achieved at 8 on three datasets.


%\vspace{-0.1cm}
\textbf{Impact of Number of Iterations.} We vary the number of iterations \emph{m} in the range of \{1, 5, 10,...,15\} on three datasets. Figure \ref{fig:iter} shows the results on AUC. We find that the performance of MDGR increases first with increase of \emph{m}. This proves the effectiveness of iteratively constructing and continuously optimizing interest sub-graph to mine the real item relation. However, when further stacking dynamic graph cluster module, we find that the performance begins to decrease. That indicates too many layers may introduce noise or cause over-smoothing. MDGR achieves optimal results when \emph{m} is 9, 5, and 5 on Amazon, Taobao and MovieLens, respectively.
%\vspace{-0.3cm}
\section{Convergence speed Comparison}
%\vspace{-0.2cm}
Figure  \ref{fig:epoch} depicts that the overall AUC of MDGR performs the best on Amazon at the 32th epoch. However, MIP and Surge peak at the 135th and 110th epoch, respectively. The ComiRec requires 230 epochs to reach its peak performance. MDGR requires only a quarter of the epochs needed by MIP and Surge to reach peak performance, and just one sevenths of the epochs required by ComiRec. MDGR exhibits a faster convergence rate. The reason is that MDGR employs a dynamic graph clustering module, which updates item presentations by continuously constructing and optimizing the interest sub-graph to continuously refine user interest embeddings. This iterative optimization allows MDGR to converge faster than those of models that rely on static or less dynamic representations. Furthermore, the adaptive nature of the dynamic graph enables MDGR to more effectively capture evolving user preferences.   As a result, MDGR can better account for the changing patterns of user behavior and interests, leading to improved recommendation accuracy."
                        \begin{figure}
%\vspace{-0.2cm}
\setlength{\abovecaptionskip}{0.1cm} 
\label{cn}
    
    \centering
    \includegraphics[width=0.88\linewidth]{uai2025-template/clusternumbig.png}
    \caption{The effect of cluster number.}
    \label{fig:cluster}
    %\vspace{-0.4cm}
\end{figure}
%\vspace{-0.1cm}
\begin{figure}
%\vspace{-0.1cm}
\setlength{\abovecaptionskip}{0.1cm}  % 缩减标题与图片之间的间距
\centering
\includegraphics[width=0.88\linewidth]{uai2025-template/image.png}
\caption{The effect of number of iterations.}
\label{fig:iter}
%\vspace{-0.4cm}
\end{figure}


\begin{figure}
\setlength{\abovecaptionskip}{0.1cm} 
\label{iter}
    \centering
    \includegraphics[width=0.9\linewidth]{uai2025-template/epoch big.png}
    \caption{AUC convergence rate during training in Amazon.}
    \label{fig:epoch}
    \vspace{-0.3cm}
\end{figure}  





%\vspace{-0.4cm}
\section{Model Complexity Analysis}
In this section, we analyze the time complexity of our MDGR model. In particular, in the encoding process of item embedding, the computational cost for item embedding, time, and position encoding are all \(O(Md)\), where \( M\) is the number of items and \(d\) is the embedding dimension. 
During the dynamic graph clustering process, MDGR costs \(O(M*·K)\) for clustering computation, and \(O(n·(M·K+H·M·K+M^2)\) for constructing and optimizing interest subgraph, where \(K\) is the number of clusters and \(M^2\) is the complexity of calculating relationships in the graph during sparsification. Additionally, the complexity for interest weight calculation and prediction are both \(O(k⋅d)\). 
Although the time complexity of MDGR is a bit higher than other baselines such as MIP and Surge, it achieves faster convergence speed compared to them. Therefore, MDGR
could achieve comparable complexity to the most recently developed baselines.








\section{Conclusion}
%\vspace{-0.3cm}
\label{sec:majhead}
In this paper, we propose a novel dynamic graph cluster based multi-interest model for sequential recommendation, which iteratively constructs and continuously optimizes interest sub-graph to update the multiple interest embedding for better recommendation. It can iteratively construct the interest sub-graph to comprehensively update the multiple interest embedding, and explore
the changing real item relation between no-adjacent items in a sequence by continuously optimizing interest sub-graph. Extensive experiments on three real-world datasets verify the effectiveness and efficiency of MDGR. As for future work, we plan to exploit more efficient graph propagation methods for better user modeling. Another plan is to learn interest embeddings by introducing fuzzy graph cluster to assign one item to different clusters.

\section{ACKNOWLEDGE}
This work was partly supported by grants from the Natural Science Foundation of Tianjin (Grant No. 23JCYBJC00080) and the Graduate Research and Innovation Project of Civil Aviation University of China (Grant No. 2024YJSKC05004). 
%recommendation that captures users' multiple interests. Integrating time and location information into item embedding, a dynamic graph clustering method is designed to construct an interest subgraph for items under the same category. Information is propagated on the graph, which reduces the noise relationship in the sequence and obtains a more accurate multi-interest user representation. When predicting, we introduce an interest weight module to set a set of weights for these interests to generate candidates more effectively.Extensive experiments on three real-world datasets in different recommendation scenarios demonstrate the effectiveness of our approach.For the future, we plan to introduce more efficient graph propagation methods for better user modeling.



%\bibliographystyle{splncs04}
%\bibliography{uai2025-template/custom}

\printbibliography
\end{document}
