\newpage
\onecolumn

\title{Supplementary Material for Paper: Cold-start Recommendation \\ by Personalized Embedding Region Elicitation}
\maketitle

\appendix
\section{Proof of Theorem 1} 
We here provide the proof of Theorem~\ref{thm:chebyshev} that are omitted in the main text.
\begin{proof}
    The optimization problem to find the Chebyshev center and its radius can be rewritten as
    \[
        \begin{array}{cl}
        \max & r \\
         \st & 2 (u_{c} + \delta)^\top (v_j - v_i) \le  \|v_j\|_2^2 - \|v_i\|_2^2 \\
         & \hspace{3cm} \forall \delta \in \mc B_r,~\forall v_i \succsim v_j \in \mbb P \\
            & u_c \in \mbb H,~r \in \R_+,
        \end{array}
    \]
    where $\mc B_r = \{\delta \in \R^d: \| \delta \|_2 \le r\}$ is a $d$-dimensional Euclidean ball of radius $r$. Pick any preference $v_i \succsim v_j \in \mbb P$, the semi-infinite constraint 
    \[
        2 (u_{c} + \delta)^\top (v_j - v_i) \le \|v_j\|_2^2 - \|v_i\|_2^2 ~\forall \delta \in \mc B_r
    \]
    is equivalent to the robust constraint
    \[
    2 u_{c}^\top (v_j - v_i) + 2 \sup_{\|\delta\|_2 \le r} \delta^\top (v_j - v_i) \le \|v_j\|_2^2 - \|v_i\|_2^2.
    \]
    Because the Euclidean norm is a self-dual norm, we have
    \[
        \sup_{\|\delta\|_2 \le r} \delta^\top (v_j - v_i) = r \|v_j - v_i\|_2.
    \]
    Substituting the above relationship to the optimization problem completes the proof.
\end{proof}

\section{Further Explanations about Settings and Region Elicitation}

In Assumption~2, the probability that an user $u_0$ has experienced an item $v_i$ is given by
\[
p_{0i} \Let w_i \times \mathrm{sigmoid}\big( \frac{1}{c_{0i}} - \frac{\kappa_0}{\sqrt{d} - c_{0i}}  \big), 
\]
where $c_{0i} = \| u_0 - v_i \|_2$ is the distance between the true user's and the item's embedding. In Figure~\ref{fig:kappa-plot}, we visualize the dependence of $p_{0i}$ on the parameter $\kappa_0$. For a fixed value of the distance $c_{0i}$, the experience probability $p_{0i}$ decreases  monotonically in $\kappa_0$. 

Next, in a toy 2D example, we visualize the region $\mathcal{U}_{\mathbb{P}}$ in Figure~\ref{fig:app-chebyshev}. Initially, a new user (red star) came into our system, but we are unaware of its true embedding location. After two steps of elicitation, it is evident that the Chebyshev center moves progressively closer to the 'True User' embedding, underscoring the success of our proposed method in predicting user embeddings.

\begin{figure}[!htb]
    \centering
    \includegraphics[width=0.8\linewidth]{figures/kappa.pdf}
    \caption{As the value of $\kappa_0$ increases, the probability that the user has prior experience (see Assumption 2) with an item is dampened. Plot with $d = 64$ and the maximal value of $c_{0i}$ is $\sqrt{d} = 8$.}
    \label{fig:kappa-plot}
\end{figure}

\begin{figure*}[!ht]
    \centering
    \includegraphics[width=1.0\textwidth]{figures/method.pdf}
    \caption{Illustration of our method in 2D toy example: Recall that a cut in the embedding space is created by pairing a positive item with a negative item. At time $t=0$, when no questions have been asked, there are no cuts in the embedding space. Moving to time $t=1$, we asked the user to elicit items 1, 2, and 4, and the user specified `dislike', `like', and `dislike' for each respective item. This introduces two cuts in the space, and the initial Chebyshev center is calculated. Progressing to time $t=3$, we ask the user to elicit item 5 and determine it to be a disliked item. As a result, a final cut is constructed by pairing item 2 with item 5. This process concludes with the finalization of region $\mathcal{U}_{\mathbb{P}}$}
    \label{fig:app-chebyshev}
\end{figure*}



\section{Cold-Starting Query List via Determinantal Point Processes}\label{sec:app-dpp}

The main task of the ``burn-in'' Phase is to create a list, denoted as $\mc L$, comprising $K$ popular items for querying the new user. If a user has no previous experience with an item $v_i$, they will indicate $\NA$ for that particular item. This $\NA$ response is uninformative because item $v_i$ does not lead to any pair of preferences being added to the preference list $\mbb P$ as by the rule of preference construction. Therefore, when constructing the cold-start item list $\mc L$, it is important to consider the probability that a user has prior experience with the items. By Assumption~2, this probability is affected by two elements: the popularity of the item and the distance from the true user embedding $u_0$ to the item embedding $v_i$.

Since we do not know the user embedding $u_0$, but we have information about the popularity of the items, we thus leverage this popularity information in the construction of $\mc L$. This line of argument also justifies the construction of the list $\mc L$ that contains only the most popular items from the list of all possible items. To find this list $\mc L$, we can use a simple weighted $K$-medoids method: given a list of $N$ items; the weighted $K$-medoids return a subset of $K$ items to be used as cluster centers. The weighted $K$-medoids problem aims to minimize the total weighted squared Euclidean distance from the item embeddings to the nearest centers.

We present in this section a determinantal point process (DPP) to construct the item list $\mc L$. We aim to find a set of items that can balance the diversity and popularity of items oblivious to the user's true embedding. DPPs are elegant probabilistic models of global, negative correlations, and they admit efficient algorithms for sampling, marginalization, conditioning, and other inference tasks~\citep{ref:kulesza2012determinantal}. DPPs have been applied in various machine learning tasks, including document summarization~\citep{ref:perez2021multi} and image search~\citep{ref:chao2015large}. We rely on the following $L$-ensemble definition of DPP.

\begin{definition}[$L$-ensemble DPP] \label{def:dpp-L}
    Given a positive semidefinite $P$-by-$P$ matrix $L \in \PSD^P$, an $L$-ensemble DPP is a distribution over all $2^P$ index subsets $J \subseteq \{1, \ldots, P\}$ such that
\[\mathrm{Prob}(J) = \det(L_J)/ \det(I + L),\]
where $L_J$ denotes the $|J|$-by-$|J|$ submatrix of $L$ with rows and columns indexed by $J$.
\end{definition}

We design the matrix $L$ that can balance the diversity and popularity of items. We compose $L$ as the sum of a similarity matrix $S$ and a popularity matrix $D$ among items:
\[
    L = S +  D, \quad \text{where} \quad D = \mathrm{diag}(w_i).
\]
The matrix $D$ is diagonal, and its diagonal elements capture the popularity of the items. A possible choice for the similarity matrix $S$ is $S=V^\top V \in \PSD^P$ where $V$ is the embedding matrix of the popular items.  Because both $S$ and $D$ are positive semidefinite, the ensemble matrix $L$ is also positive semidefinite.

We then find the combination of top-$K$ items that fit with the construction of the cold-start querying list by solving the following problem
    \be \label{eq:det}
        \max \left\{ \det ( L_z) ~:~ z \in \{0, 1\}^P,~ \| z \|_0= K \right\},
    \ee
where $L_z$ is a submatrix of $L$ restricted to rows and columns indexed by the one-components of $z$. It is well-known that the solution to problem~\eqref{eq:det} coincides with the MAP estimate of the DPP with a cardinality constraint~\citep{ref:kulesza2012determinantal}.  Further, it is crucial to highlight that problem~\eqref{eq:det} is a submodular maximization problem since the log-probability function $\log \det(L_z)$ is a submodular function~\citep{ref:gillenwater2012near}. Further, this problem is well-known to be NP-hard~\citep{ref:kulesza2012determinantal}, and thus it is notoriously challenging to solve~\eqref{eq:det} to optimality.~\citet{ref:chen2018fast} provides a greedy algorithm for the MAP estimation problem. The aforementioned greedy algorithm has been proven to achieve an approximation ratio of $\mc O(\frac{1}{k!})$~\citep{ref:civril2009selecting} and incur a computational complexity of $\mc O(K^2P)$. Moreover, in order to improve the solution quality, we introduce a 2-neighborhood local search strategy. This method involves an iterative process of exchanging one element from the current set with one element from the complementary set, continuing until no additional improvement can be achieved.




\section{Maximum Likelihood Estimation of the Tolerance Parameter}
\label{sec:app-estimation}

We provide the maximum likelihood estimation for the parameters $\kappa$. Without any loss of generality, we consider a training dataset consisting of $N$ items and $M$ users, the user embeddings $u_m$ and the item embeddings $v_i$ are given. The interactions between the users and the items are presented by a binary-valued data matrix $E \in \{0, 1\}^{M \times N}$ with each $E_{mi}$ admits values
    \[
    E_{mi} = \begin{cases}
        1 & \text{if user $m$ has an experience with item $i$}, \\
        0 & \text{otherwise.}
    \end{cases}
    \]
Suppose that there exists a global constant $\kappa \in \R_+$ such that $E_{mi}$ follows a Bernoulli random variable with
\[
\mathrm{Prob}( E_{mi} = 1 ) = w_i \times \mathrm{sigmoid}\big( \frac{1}{c_{mi}} - \frac{\kappa}{\sqrt{d} - c_{mi}}  \big),\]
where $c_{mi}$ is the embedding distance between the user the the item $c_{mi} = \| u_m - v_i \|_2$.
Given the data matrix $E$ and suppose that the elements $E_{mi}$ are jointly independent, the likelihood is
    \[
    L(\kappa | E) = \prod_{m=1}^M \prod_{i=1}^N \left( p_{mi} (\kappa)  \right)^{E_{mi}} \left( 1 - p_{mi}(\kappa)\right)^{1 - E_{mi}},
    \]
    where $p_{mi}(\kappa)$ is
    \[ 
    p_{mi}(\kappa) = \frac{w_i}{1 + \exp \big( \frac{\kappa}{\sqrt{d} - c_{mi}} - \frac{1}{c_{mi}} \big)}.
    \]
    The estimate $\hat \kappa_{\mathrm{MLE}}$ minimizes the negative log-likelihood:
    \begin{align*}
        &\min_{\kappa \ge 0}~\sum_{m=1}^M \sum_{i=1}^N   \log \left( 1 + \exp \big( \frac{\kappa}{\sqrt{d} - c_{mi}} - \frac{1}{c_{mi}} \big)\right) \\
        &  - 
        \sum_{m=1}^M \sum_{i=1}^N (1 - E_{mi}) \log \left( 1 + \exp \big( \frac{\kappa}{\sqrt{d} - c_{mi}} - \frac{1}{c_{mi}}   \big) - w_i\right),
    \end{align*}
    which can be found by standard gradient descent algorithms.

\section{Questionnaire Design} \label{sec:app-qs-design}

Inspired by the structure of the Netflix questionnaire~\citep{ref:kweon2020deep}, we devise our questionnaire methodology to capture a comprehensive set of preference pairs while minimizing user effort. Users are provided the option to skip specifying preferences, streamlining the process. In our questionnaire, users are presented with a product display, and while scrolling through, they only need to indicate `like' or `dislike' for products they are familiar with. An illustration of the questionnaire is provided in Figure~\ref{fig:questionnaire}. In practice, although our experimental design prompts new users to specify preferences for $100$ items, our algorithm performs effectively even when utilizing an average of around $15\%$ of user responses, evident by the user response ratio in Table~\ref{tab:like-dislike}.

\begin{figure*}[!ht]
    \centering
    \includegraphics[width=0.8\textwidth]{figures/question.pdf}
    \caption{Illustration of our questionnaire: Taking inspiration from the Netflix questionnaire as outlined in~\cite{ref:kweon2020deep}, we structure each questionnaire as depicted above. Upon a new user entering our system, we prompt them to indicate their preferences for a set of items. Users can specify `like' ($+1$), `dislike' ($-1$), or choose to skip the item ($\NA$).}
    \label{fig:questionnaire}
\end{figure*}

\begin{table*}[!ht]
% \vspace{-7mm}
\centering
\caption{Number of items responded to by users using the PERE method. The response ratio is calculated over 100 queried items.}
\label{tab:like-dislike}

\pgfplotstabletypeset[
    col sep=comma,
    string type,
    every head row/.style={before row=\toprule,after row=\midrule},
    % every row no 0/.style={after row=\midrule},
    every row no 3/.style={after row=\midrule},
    % every row no 11/.style={after row=\midrule},
    every last row/.style={after row=\bottomrule},
    columns/data/.style={column name=Method, column type={l}},
    columns/data/.style={column name=HR@1, column type={l}},
    columns/data/.style={column name=AUC@10, column type={l}},
    columns/data/.style={column name=NDCG@10, column type={l}},
    columns/data/.style={column name=NDCG@30, column type={l}},
    columns/data/.style={column name=MAP, column type={l}},
    columns/data/.style={column name=MRR, column type={l}},
]{tables/likes_NA.csv}
\end{table*}

\section{Additional numerical results} \label{sec:app-exp}
\subsection{Burn-in Phase Comparison} \label{sec:app-burn-in}

% For the burn-in phase, we compare the DPP method against two popular baselines: RMV~\cite{ref:fonarev2016efficient} and Kmedoids~\cite{ref:liu2011wisdom}. The Greedy approach picks the most popular items, the Random method randomly selects $K$ items from the popular items. Additionally, we implement the $K$-medoids algorithm used in a previous study~\cite{ref:liu2011wisdom}, to identify representative items through cluster centroids. We make a slight modification to the $K$-medoids algorithm by considering only the items belonging to the popular items as potential centroids.
We use LightGCN / BiVAE for the burn-in phase to generate item embedding and conduct experiments on Gowalla and Amazon-Books datasets. We employ two widely recognized and straightforward baseline methods: RMV~\citep{ref:fonarev2016efficient} and $K$-Medoids~\citep{ref:liu2011wisdom}: RMV optimizes the volume of a rectangle matrix by selecting diverse yet orthogonal seed items in the embedding space. On the other hand, the $K$-Medoids algorithm, previously employed in a study~\citep{ref:liu2011wisdom}, identifies representative items through cluster centroids. We slightly modify the $K$-medoids algorithm by considering only the items belonging to the popular items as potential centroids. Note that sequential-based preference elicitation methods, such as DPE~\citep{ref:parapar2021diverse} or conditional DPP, are not applicable during the `burn-in' phase. In this phase, we aim to create a standardized questionnaire for all new users entering our system. Sequential-based methods, in contrast, involve asking new questions based on the responses of previous users.

Results for the burn-in phase are summarised in Table~\ref{tab:phase_1_full}. The results demonstrate that DPP (Determinantal Point Process) is the best approach for selecting initial items for the initial queries. DPP significantly outperforms baseline methods regarding performance metrics in all two datasets. The success of DPP can be attributed to its ability to effectively select a diverse set of items while considering the popularity score of each item. This combination allows DPP to balance diversity and relevance, resulting in superior performance compared to the baseline methods.

\begin{table*}[htb]
% \vspace{-7mm}
\centering
\caption{Benchmark of performance metrics on Gowalla and Amazon-Books. Larger values are better. The best performance for any fixed number of questions is highlighted in bold. The number of items, in this case, is $K=50$ for all methods.}
\label{tab:phase_1_full}

\pgfplotstabletypeset[
    col sep=comma,
    string type,
    every head row/.style={before row=\toprule,after row=\midrule},
    % every row no 0/.style={after row=\midrule},
    every row no 3/.style={after row=\midrule},
    % every row no 11/.style={after row=\midrule},
    every last row/.style={after row=\bottomrule},
    columns/data/.style={column name=Method, column type={l}},
    columns/data/.style={column name=HR@1, column type={l}},
    columns/data/.style={column name=AUC@10, column type={l}},
    columns/data/.style={column name=NDCG@10, column type={l}},
    columns/data/.style={column name=NDCG@30, column type={l}},
    columns/data/.style={column name=MAP, column type={l}},
    columns/data/.style={column name=MRR, column type={l}},
]{tables/Phase1_full.csv}
\end{table*}

Moreover, to show the effectiveness of our proposed sequential elicitation in Section~\ref{sec:solution}, we conduct an additional experiment that compares PERE, which uses a static 50-item questionnaire in the beginning, and a series of 5 dynamic 5-item questionnaires after that, with a baseline where only a burn-in questionnaire using DPP is utilized to create a static 100-item questionnaire. 
Table~\ref{tab:ablation_1_results} illustrates that the combination of a 50-item questionnaire along with a series of 5 dynamic 5-item questionnaires outperforms the 100-item questionnaire, which highlights the effectiveness of our PERE method.

\begin{table}[ht]
    \centering
    \caption{Comparison between a burn-in questionnaire using DPP and PERE with $100$ elicited items for each method on Amazon-Books dataset.}

    \begin{tabular}{lccc}
        \toprule
         Datasets & Method &  NDCG@10 $\uparrow$ & MRR $\uparrow$ \\
         \hline
         Gowalla & Burn-in & 0.1497 & 0.1335 \\
         - & PERE & \textbf{0.1806} & \textbf{0.1518} \\
         \hline
         Amazon-Books & Burn-in & 0.3388 & 0.3152 \\ 
         - & PERE & \textbf{0.3616} & \textbf{0.3235} \\   
         \bottomrule
    \end{tabular}   \label{tab:ablation_1_results}
\end{table}

\subsection{Greedy and DPP Comparison}
While the greedy method chooses the most popular item, we employ the Determinantal Point Process (DPP) in the `burn-in' phase to achieve a better balance between diversity and popularity. DPP is advantageous in scenarios where preferences may diverge from mainstream popularity, ensuring a tailored and inclusive experience. Table~\ref{tab:greedy} demonstrates that our method is more effective than the greedy method in constructing a personalized questionnaire for new users with 100 elicited items.
\begin{table}[ht]
    \centering
    \caption{Comparison between PERE and Greedy method on Amazon-Books dataset.}
    \begin{tabular}{lccc}
    \hline
        Methods & NDCG@10 $\uparrow$ & MAP $\uparrow$ & MRR $\uparrow$ \\ \hline
        Greedy &  0.3415 & 0.198 & 0.3043  \\ 
        PERE  & \textbf{0.3616} & \textbf{0.2930} & \textbf{0.3235} \\ \hline
    \end{tabular}
    \label{tab:greedy}
\end{table}

\subsection{Questionnaire Size Analysis}
To be user-friendly, the questionnaire size should be small to avoid stressing the user's cognitive load. We find that the number of items at each round does not significantly affect the quality of the method. What is more interesting to track is the improvement of the quality over a long history as the \textit{total} number of questions increases. Therefore, we conduct an additional experiment to study the impact of the total number of questions on the performance metrics NDCG@$10$ and MRR. Table~\ref{tab:ablation2} shows that the performance of PERE increases with the history size. 
\begin{table}[ht]
    \centering
    \caption{Performance improvements with the dynamic questionnaire size on Amazon-Books dataset.}
    \begin{tabular}{lccccccc}
    \hline
        Elicited items & 5 & 10   & 20 & 30 & 40 & 50 \\ \hline
        NDCG@10 $\uparrow$ &  0.3111 & 0.3179 & 0.3264 & 0.3372 & 0.3564 & 0.3616 \\ 
        MRR $\uparrow$  & 0.2864 & 0.2905 & 0.2972 & 0.3048 & 0.319 & 0.3235 \\ \hline
    \end{tabular}
    \label{tab:ablation2}
\end{table}

\section{Main Experiment Setting}
\subsection{Datasets Description}
In this paper, we use Gowalla~\citep{ref:cho2011friendship} dataset and Amazon-Books~\citep{ref:ni2019justifying} dataset. We report the statistics of Gowalla and Amazon-Books datasets in Table~\ref{tab:data-stat}. The description for each dataset is the following:
\begin{itemize}
    \item Gowalla is a location-based dataset that contains information about user check-ins at various locations. 
    \item Amazon-Books is a subset of the Amazon Product Review dataset, specifically centered on book products. This dataset comprises reviews and user ratings for various products.
\end{itemize}

\begin{table}[ht]
    \centering
    \caption{Characterisitics of datasets used in our experiments.}
    \begin{tabular}{lcccc}
    \hline
        Dataset & Train User \# & Item \#  & Interaction \#  & Density \\ \hline
        Gowalla & 28858 & 40981 & 1027370 & 0.00084 \\ 
        Amazon-Books  & 51643 & 91599 & 2984108 & 0.00062 \\ \hline
    \end{tabular}
    \label{tab:data-stat}
\end{table}

Amazon-Books includes both explicit and implicit user responses related to book products, whereas Gowalla exclusively provides implicit information indicating user preferences toward different locations. We employ two well-known methods to generate collaborative filtering embeddings for items: LightGCN and biVAE. LightGCN is trained solely to predict user-item interactions, making it suitable for datasets with implicit responses. On the other hand, biVAE is designed to predict specific ratings for user-item interactions, which necessitates explicit responses. Given that Gowalla contains only implicit responses, we exclusively use LightGCN on this dataset. However, since Amazon-Books contains explicit and implicit responses, we can utilize LightGCN and biVAE on this dataset. 

% \begin{table}[!ht]
%     \centering
%     \small
%     \begin{tabular}{|l|l|l|l|l|l|}
%     \hline
%         Dataset & Train User & Item  & Interaction  & Density \\ \hline
%         Gowalla & 28858 & 40981 & 1027370 & 0.00084 \\ \hline
%         Amazon-Books  & 51643 & 91599 & 2984108 & 0.00062 \\ \hline
%     \end{tabular}
%     \caption{Characterisitics of datasets used in our experiments.}
%     \label{tab:data-stat}
% \end{table}

\subsection{Baseline Description}
There are in total $7$ baselines used in this paper, which can be divided into fixed questionnaire generation method and sequential questionnaire generation method:
Fixed questionnaire generation methods:
\begin{itemize}
    \item RMV: Please refer to Section~\ref{sec:app-burn-in}.
    \item $K$-medoids: Please refer to Section~\ref{sec:app-burn-in}.
    \item DRE: initially, this method defines a categorical distribution for sampling seed items from the entire item pool. Subsequently, it simultaneously learns the categorical distributions and a neural reconstruction network to infer users' preferences based on collaborative filtering (CF) information from the sampled seed items. Then, the encoder is utilized to select the seed items, while the decoder is used to recommend the favorite items.
    \item DPP: Please refer to Section~\ref{sec:app-dpp}.
\end{itemize}
Sequential questionnaire generation method:
\begin{itemize}
    \item PEO: This method presents a novel elicitation approach to construct a static preference questionnaire. It formulates the task of generating preference questionnaires, encompassing relative questions for new users as an optimization problem that can be solved in linear time of the number of items.
    \item Conditional DPP: Conditional DPP is a modified version of DPP that selects $K$ items from the remaining set of items.
    \item DPE: This preference elicitation model employs multi-armed bandits to diversify the seed item set through topic and item diversity.
\end{itemize}
\subsection{Implementation Details} 
We use the standard codebase of LightGCN\footnote{\url{https://github.com/gusye1234/LightGCN-PyTorch}} and cornac implementation of biVAE\footnote{\url{https://github.com/recommenders-team/recommenders/tree/main}} to generate item embedding and new user embedding. Afterward, we generate a new user according to Section 4 and use it as ground truth in our evaluation. This characteristics generation is necessary because we want to model experience probability that allows users to skip a question ($\NA$ response) in our questionnaire.  

