% \documentclass{uai2024} % for initial submission
\documentclass[accepted]{uai2024} % after acceptance, for a revised version; 
% also before submission to see how the non-anonymous paper would look like 
                        
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2024} % ptmx math instead of Computer
                                         % Modern (has noticeable issues)
% \documentclass[mathfont=newtx]{uai2024} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}
\usepackage{pgfplotstable}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams
\usepackage{dsfont}
%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

\title{Cold-start Recommendation
by Personalized Embedding Region Elicitation}

% The standard author block has changed for UAI 2024 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1,3]{Hieu Trung Nguyen}
% \author[1]{Hieu Trung Nguyen}
% \author[1,2]{Further~Coauthor}
% \author[3]{Further~Coauthor}
\author[1]{Duy Nguyen}
\author[2]{Khoa Doan}
\author[3]{Viet Anh Nguyen}
% \author[3,1]{Further~Coauthor}
% Add affiliations after the authors
\affil[1]{%
    VinAI Research
}
\affil[2]{%
    College of Engineering \& Computer Science, VinUniversity
}
\affil[3]{%
    The Chinese University of Hong Kong
}
% \affil[3]{%
%     Another Affiliation\\
%     Address\\
%     …
%   }

\usepackage{amsmath, amsfonts, amssymb, enumitem, booktabs, amsthm}
\input{commands}

\pgfplotsset{compat=1.18}
\begin{document}
\maketitle
\begin{abstract}
Rating elicitation is a success element for recommender systems to perform well at cold-starting, in which the systems need to recommend items to a newly arrived user with no prior knowledge about the user's preference. Existing elicitation methods employ a fixed set of items to learn the user's preference and then infer the users' preferences on the remaining items. Using a fixed seed set can limit the performance of the recommendation system since the seed set is unlikely optimal for all new users with potentially diverse preferences. This paper addresses this challenge using a 2-phase, personalized elicitation scheme. First, the elicitation scheme asks users to rate a small set of popular items in a ``burn-in'' phase. Second, it sequentially asks the user to rate adaptive items to refine the preference and the user's representation. Throughout the process, the system represents the user's embedding value not by a point estimate but by a region estimate. The value of information obtained by asking the user's rating on an item is quantified by the distance from the region center embedding space that contains with high confidence the true embedding value of the user. Finally, the recommendations are successively generated by considering the preference region of the user. We show that each subproblem in the elicitation scheme can be efficiently implemented. Further, we empirically demonstrate the effectiveness of the proposed method against existing rating-elicitation methods on several prominent datasets.
\end{abstract}

\section{Introduction}
The vast amount of available information and content in this digital era poses severe challenges for individuals seeking information and recommendations. Personalized recommender systems are exerting profound impacts on various fields by leveraging user data to generate personalized suggestions, with applications spanning from networks~\citep{ref:natarajan2020resolving, ref:wu2019neural}, e-commerce~\citep{ref:alamdari2020systematic, ref:jiang2019trust} to e-learning~\citep{ref:khanal2020systematic, ref:george2019review}. Personalized recommender systems may produce accurate suggestions for a user's preferences by exploiting users' characteristics and historical interactions with items. However, under the cold-start settings, recommendation models may fail miserably because they can only access a limited, or even no, user's interaction history~\citep{ref:gope2017survey}. Cold-start recommendation can be categorized into three main branches, depending on the specification of the user-item pool: (i) new users, invariant items, (ii) invariant users, new items, and (iii) new users, new items~\citep{ref:gope2017survey}. Introductory materials for the cold-start problems in recommendation systems can be found in~\citet{ref:adomavicius2005toward, ref:schafer2007collaborative}, and extensive literature reviews can be found in~\citet{ref:bobadilla2013recommender, ref:gope2017survey}. We will provide a brief literature review in Section~\ref{sec:related}.


This paper considers the most popular setting of cold-start recommendation wherein a new user arrives, and our goal is to make relevant recommendations to the user from the list of existing (invariant) items. We assume that there is no available side information regarding the new user, and thus, there are no trivial methods to initialize the representation of the user in the system. This situation arises frequently in practically any real-time environment, for example, when a new user signs up for a new account on an online platform.

%(I don't understand what these 2 sentences mean --- These approaches are based on the fundamental assumption that users will always provide ratings for items, regardless of whether they have actually experienced the item or not. However, this assumption is unrealistic in practice.

% \hl{discuss problems in existing methods}
Despite their effectiveness, existing methods select a fixed set of elicited items to infer a new user’s preferences on the remaining items~\citep{ref:sepliarskaia2018preference}. They effectively approximate the initial embedding of the user as a function of only these seed items, no matter who the user is. This approach works well when the user has broad interests in multiple item categories but could be wasteful for users with narrower interests, resulting in sub-optimal initial recommendations. In practice, one can easily see that most users prefer some specific categories of items; for example, a user may be interested in action, adventurous, and mystery movies but not others such as comedy, drama, or romance. Thus, dynamically selecting or personalizing the seed items for a new user to represent the user's initial embedding can be beneficial in improving the quality of initial recommendations. 

This paper introduces a versatile framework to capture specific events where new users engage with items, denoted as ($+1$) or ($-1$) for positive or negative interactions. This framework can be used in two cases: $\textit{i}$) Predicting whether a user explicitly expresses an affinity for a product/item, signified by ($+1$) for a positive review and ($-1$) for a negative review, such as rating prediction. $\textit{ii}$) Forecasting whether a user undertakes actions implicitly indicating a preference for an item, indicated by ($+1$) when a user makes a purchase or ($-1$) when no interaction with the item is recorded, such as news recommendation~\citep{ref:bae2023lancer} and click-through-rate prediction~\citep{ref:zhang2022deep}. 

\textbf{Contributions.}  To recommend items to new users, we propose a framework that estimates a region in the embedding space highly likely to contain the embedding of a new user. Our approach involves a preference elicitation process for selecting the best items to ask users to rate, considering that no initial information about the users is available. As a new user arrives, we prompt a static, short questionnaire with a carefully selected set of questions using a determinantal point process (DPP). The DPP ensures that the items listed in the questionnaire strike a balance between diversity and popularity (or quality), and the user's feedback on these items will serve as initial information about the users.

Our framework then focuses on constructing a \textit{dynamic} questionnaire personalized to each user and sequentially updates our belief about the user's preference. By formulating and solving a minimization problem, we choose items that effectively narrow down the region in the embedding space where the new user's embedding is most likely to be found. This adaptive approach allows us to gather further information from users while limiting the questions to a relatively small number. Hence, our approach reduces the cognitive load on the user but can guarantee a good localization of the user's embedding simultaneously. 
%Regarding the novelty of our work, it is worth noting that according to our literature review, our method is the first to model new users using a set on the embedding space and computationally narrows the set with each cut after each round of questions.

To enhance the practicality of our work, we introduce a user behavior model that incorporates a probabilistic assessment of whether a user has previous experience with an item. Users can provide feedback, either positively ($+1$) or negatively ($-1$), for items they have experienced. In practical cases where users have not experienced an item or choose to ignore the question, the feedback is $\NA$, and this information is also taken into account to refine the selection of the items to query.

\textbf{Outline.} The subsequent sections unfold as follows: we first introduce the problem settings and describe the user behavior model. Then, we present our solution package highlighting our Personalized Embedding Region Elicitation (PERE) method. The efficacy of our solution package is demonstrated through numerical results in the last section. 


\section{Problem Settings} \label{sec:problem}

The recommendation system has a list of $N$ items; each item is represented by a $d$-dimensional embedding vector $v_i \in \R^d$ for $i = 1, \ldots, N$. Additionally, we extract a popularity score for each item based on historical user-item interactions, represented by a normalized number $0 \leq w_i \leq 1$. The items are sorted in descending order of popularity, with $w_{i} \geq w_{i+1}$. The top-$P$ items are classified as \textit{popular}, while the remaining $N - P$ items are considered non-popular. When a new user arrives without prior information, we aim to learn a suitable embedding vector for this user and then utilize this embedding vector for personalized item recommendations. Throughout, we rely on the assumption that the embeddings are of sufficient quality to enable distance-based recommendation methods such as k-nearest-neighbor to perform accurately. We make the following assumption:


\begin{assumption}[Embedding space] \label{a:embedding}
The embeddings are normalized to a $d$-dimensional hypercube $\mathbb{H} = [0, 1]^d$. Moreover, the items' embeddings $v_i \in \mbb H$, $\forall i=1, \ldots, N$ do not change over time. 
\end{assumption}

This assumption imposes a bounded constraint on the embedding space, a common practice for machine learning algorithms. The invariance of item embeddings is also reasonable for most practical online platform systems where the items can be movies, books, or songs.

Our recommendation system employs a three-option feedback mechanism for user interactions. Whenever the user is presented with an item $i$ characterized by an embedding vector $v_i$, the user can rate the item using three options: $-1$, $+1$, or $\NA$. A rating of $-1$ signifies a negative experience or dislike, while a rating of $+1$ indicates a positive experience or liking of the item. The user may also choose $\NA$ to express a lack of experience or refuse to disclose the preference. By employing this interactive scheme, we propose a two-phase preference elicitation process consisting of a burn-in phase and a sequential and adaptive question-answering (Q\&A) phase. The elicitation process aims to extract the user's preferences and learn an appropriate embedding vector to represent the user in the common embedding space. We present the overall flow of our approach in Figure~\ref{fig:flow} and summarize the process in each phase as follows: when a new user arrives, we construct a burn-in questionnaire that consists of $K$ items to ask the user. The user rates $-1$, $+1$, or $\NA$ for each item in this list. By consolidating the responses from the user, we can divide the set $\mc L$ into three subsets: $\mc L^-$, $\mc L^{+}$, and $\mc L^{\NA}$, that represent items disliked, liked, and with no expressed opinion, respectively.\footnote{By construction, $\mc L^-$, $\mc L^{+}$, and $\mc L^{\NA}$ are exhaustive and mutually exclusive: their pairwise intersection is the empty set, and their union is $\mc L$.} 

Subsequently, we further facilitate the elicitation of user preferences through an adaptive Q\&A process. Our system sequentially presents to the user $k$ new items in each round, and the user provides feedback ($-1$, $+1$, or $\NA$) about the item to refine the identification of their embedding vector.



\begin{figure*}[!ht]
    \centering
    \includegraphics[width=0.85\linewidth]{figures/recommender.pdf}
    \caption{When a new user arrives, we use a determinantal point process to query a diverse set of items from the $P$ \textit{popular} items list to construct the burn-in questionnaire. Subsequently, we use a sequential question-answering procedure to refine the embedding region of the user's preferences. The recommendation is made using the Chebyshev center of the embedding region, which is consistent with the user's stated preferences.} 
    \label{fig:flow}
    %\vspace{-5mm}
\end{figure*}

\subsection{A Model of the User's Behavior}

To create an interactive mechanism connecting the user and the recommender system, we need to build a behavior model for each user.  Without any loss of generality, we assume that each user can be represented by an embedding vector $u_0 \in \mbb H$. The true location of the vector $u_0$ remains elusive to the recommendation system, but it is \textit{in}variant throughout the procedure of preference elicitation. The user can rate items positively ($+1$) or negatively ($-1$) only if they have prior experience with the item. For instance, in the context of Netflix, where items are movies, this translates to the user having watched certain movies. A key aspect of modeling user behavior revolves around the probability of experience. We make the following assumption on the probability that a user has experienced an item:

\begin{assumption}[Experience probability] \label{a:exp-prob} The probability that an user $u_0$ has experienced an item $v_i$ is given by
    \be \label{eq:prob}
    p_{0i} \Let w_i \times \mathrm{sigmoid}\big( \frac{1}{c_{0i}} - \frac{\kappa_0}{\sqrt{d} - c_{0i}}  \big), 
    \ee
    where $c_{0i} = \| u_0 - v_i \|_2$ is the distance between the true user's and the item's embedding. Moreover, whether the user has prior experience with item $i$ is independent of whether the user has prior experience with any other item $j \neq i$.
\end{assumption}

Assumption~\ref{a:exp-prob} proposes that the probability that a user has experienced an item depends on two main factors: the item popularity $w_i$ and the distance between the user's embedding and the item's embedding $c_{0i}$. According to Assumption~\ref{a:exp-prob}, if two items, $i$ and $j$, have equal distances from the user's embedding, the item with higher popularity (indicated by a larger weight) will have a higher experience probability. The relationship between the experience probability and the distance between embeddings is complex. Notably, the Euclidean distance from $u_0$ to $v_i$ cannot exceed $\sqrt{d}$, where $d$ represents the dimension of the embedding hypercube $\mathbb{H}$. Additionally, the parameter $\kappa_0$ acts as a tolerance parameter known only to the user. When $c_{0i}$ approaches 0, the sigmoid term tends to 1, and when $c_{0i}$ approaches $\sqrt{d}$, the sigmoid term tends to 0. To study the impact of parameter $\kappa$ on the probability that a user has experienced an item, we conduct an experiment in the supplementary. 


Moreover, preference consistency is a fundamental question in the preference elicitation literature. Inconsistency in preference elicitation refers to situations when users provide conflicting or contradictory ratings or feedback for the same or similar items. For instance, if a user $u_0$ prefers item $v_i$ to $v_j$ but rates ($-1$) and ($+1$) for those two items, respectively. Thus, to ensure the consistency of our proposed method, we make the following assumption according to the preference consistency between a user and two items:

\begin{assumption}[Preference consistency] \label{a:consistency}
    Suppose that the true user's embedding is $u_0$. Given any two items $v_i$ and $v_j$ such that $\| u_0 - v_i \|_2 \le \| u_0 - v_j \|_2$:
    \begin{enumerate}[label=(\roman*), leftmargin = 5mm]
        \item \label{a:consistency:+} If the user rates $v_j$ positively ($+1$), then the user can only rate $v_i$ either positively ($+1$) or with $\NA$.
        \item \label{a:consistency:-} If the user rates $v_i$ negatively ($-1$), then the user can only rate $v_j$ either negatively ($-1$) or with $\NA$.
    \end{enumerate}
\end{assumption}

Assumption~\ref{a:consistency} ensures that the user's preference is consistent with the neighborhood structure of the embedding space. Inconsistency may arise if the user rates $v_i$ negatively and  $v_j$ positively, even though $v_i$ is closer to the user's true embedding than $v_j$. This inconsistency is wholly eliminated under Assumption~\ref{a:consistency}.


\section{Adaptive Q\&A with Personalized Embedding Region Elicitation} \label{sec:solution}
This section presents our proposed solution package comprising two distinct phases: a burn-in questionnaire and a sequential and adaptive Q\&A process. Additionally, we provide a recommendation module based on the Chebyshev center of the region, which is designed specifically for the recommendation task. As there is no prior information about the user's preferences, we implement a burn-in phase using a determinantal point process (DPP) to generate a short, static questionnaire for each new user. The DPP balances two criteria: diversity and popularity.

 The adaptive Q\&A process facilitates the sequential elicitation of user preferences. We assume this phase lasts $T$ rounds; in each round, we select $m$ items to ask for feedback from the user. While the user's true embedding vector $u_0$ is not available to the system, we can characterize the plausible values of the user's embeddings from the user's feedback. By utilizing a set of positively rated items and negatively rated items, we can form pairwise preferences and effectively refine the plausible embedding region. Therefore, this iterative elicitation allows us to increase the accuracy of the preference approximation.


\textbf{Set of plausible embeddings.} We suppose the user has indicated a set of positively-rated items $\mc L^+$ and a set of negatively-rated items $\mc L^-$. The set of induced preferences $\mbb P$ is formed by picking any $i \in \mc L^+$ and any $j \in \mc L^-$, and appending the preference $v_i \succsim v_j$ to $\mathbb{P}$.\footnote{For each user, we use $\succsim$ to denote a preference relation among items, that is, $\succsim$ denotes a complete and transitive order.} As a consequence, we have the following preference set
    \be \label{eq:P}
    \mathbb{P} = \{ v_i \succsim v_j: \forall v_i \in \mc L^+,~\forall v_j \in \mc L^-\}.
    \ee
From any preference set $\mbb P$, we can subsequently characterize a region $\mc U_{\mbb P}$ that conforms with the user's preferences. For instance, if we pick any preference relation $v_i \succsim v_j$ in the preference set $\mbb P$, Assumption~\ref{a:consistency} implies that the distance from the user's embedding $u_0$ to $v_i$ should be smaller than the distance to $v_j$. Because we are using Euclidean distance, this, in turn, implies that
\[
    \|v_i - u_0\|_2^2 \le \|v_j - u_0\|_2^2.
\]
By consolidating all preferences in the preference set $\mbb P$, we expect the user's embedding to satisfy all of the below equations simultaneously. Thus, we have
\[
     \|v_i - u_0\|_2^2 \le \|v_j - u_0\|_2^2 \quad \forall v_i \succsim v_j \in \mbb P.
\]
By expanding the norms, $u_0$ should satisfy
\[
2 u_{0}^\top (v_j - v_i)  \le \|v_j\|_2^2 - \|v_i\|_2^2 \quad \forall v_i \succsim v_j \in \mbb P.
\]
We denote $\mc U_{\mbb P}$ as a set that contains all possible values of the embeddings that are consistent with the preference set $\mbb P$, then we have 
\[
\begin{array}{cl}
&\mc U_{\mbb P} = \{ u \in \mbb H: 2 u^\top (v_j - v_i) \le \|v_j\|_2^2 - \|v_i\|_2^2~~\\ & \hspace{5cm} \forall v_i \succsim v_j \in \mbb P
\},
\end{array}
\]
and under Assumption~\ref{a:consistency}, we have $u_0 \in \mc U_{\mbb P}$.

% \subsubsection{Locating the Chebyshev Center} \label{sec:chebyshev}
\textbf{Locating the Chebyshev center.} Now, we determine the Chebyshev center of the set $\mc U_{\mbb P}$. The Chebyshev center refers to the center of a ball with the maximum radius and is enclosed within a bounded set with a non-empty interior. Consequently, the Chebyshev center of the confidence set $\mc U_{\mbb P}$ is considered a safe point estimate for the true embedding $u_0$. Moreover, by identifying the Chebyshev center, we can find the most aggressive cut to the set $\mc U_{\mbb P}$, thereby expediting the refinement of the plausible embedding region.

The Chebysev center $u_c\opt$ of the set $\mc U_{\mbb P}$ and the radius $r\opt$ can be computed by solving the following problem
\[
\Max{u_c \in \mbb H,~r \in \R_+}~\left\{ r ~:~ \|u - u_c \|_2^2 \le r^2 ~~ \forall u \in \mc U_{\mbb P}\right\}.
\]
For our problem, the Chebyshev center can be obtained by solving a linear program, resulting from the following theorem.
\begin{theorem}[Chebyshev center] \label{thm:chebyshev}
Suppose that $\mc U_{\mbb P}$ has a non-empty interior. The Chebyshev center $u_c\opt$ of the set $\mc U_{\mbb P}$ can be found by solving the following problem
\[
    \begin{array}{cl}
         \max & r \\
         \st & 2 u_{c}^\top (v_j - v_i) + 2 r \| v_j - v_i \|_2 \le \|v_j\|_2^2 - \|v_i\|_2^2 \\
         & \hspace{4cm} \forall v_i \succsim v_j \in \mbb P \\
             & u_c \in \mbb H,~r \in \R_+.
    \end{array}
\]
\end{theorem}

The proof of Theorem 1 follows from a duality argument and is relegated to the supplementary material.

\begin{figure}[!ht]
    \centering
    \includegraphics[width=0.8\linewidth]{figures/chebyshev.pdf}
    \caption{The hyperplanes $2 u_{c}^\top (v_i - v_j) = \|v_i\|_2^2 - \|v_j\|_2^2$ for $i \succsim j \in \mbb P$ are drawn as black lines, and they define the boundary of the set $\mc U_{\mbb P}$. The ball centered at the Chebyshev center $u_c\opt$ with radius $r$ is the largest inscribed Euclidean ball of $\mc U_{\mbb P}$. Our model recommends items based on the proximity to the Chebyshev center: here, two movies nearest to $u_c\opt$ are highlighted.}
    \label{fig:chebyshev}
\end{figure}

\textbf{Next item to query.} At time $t+1$, we have already obtained user feedback on the list $\mc{L}_t$ of popular items, represented by the tuples $\mc{L}^+_t$, $\mc{L}^-_t$, and $\mc{L}^\NA_t$. The remaining popular items are denoted as $\mc V_t = \{v_i\}_{i=1, \ldots, P} \backslash \mc L_t$. Then, for the next $T$ rounds, we select the next item $v_i$ from $\mc{V}_t$ and obtain the user's rating. The goal is twofold: if the user rates the newly presented item positively ($+1$), we can leverage this positive experience along with the list of negatively-rated items $\mc{L}^-_t$ to generate new pairwise preferences. Conversely, if the user rates the item negatively ($-1$), this information can be combined with the set $\mc{L}^+_t$ to create additional preferences. However, the feedback is uninformative if the user rates the new item as $\NA$. Preferences involving two items $v_i$ and $v_j$ can be represented by a hyperplane equation 
\[
    2 u_{c}^\top (v_i - v_j) = \|v_i\|_2^2 - \|v_j\|_2^2.
\]
A possible goal is to find the next item $v_i \in \mc{V}_t$ that minimizes the total weighted distance from the incumbent Chebyshev center to all constructed hyperplanes. To find the next item to ask, we need to consider the probability that the user has prior experience with the next item. The higher the probability, the more inclined the system should choose this item to obtain informative feedback (either a positive or a negative rating). Because the recommendation system does not know $u_0$ and $\kappa_0$, it does not know the true value of the probability that the user $u_0$ has prior experience with item $i$. Instead, the system will employ the following surrogate
\begin{align} \label{eq:hat_p}
\widehat{p}_i &= \widehat{\mathrm{Prob}}( \text{user has experienced } v_i) \\
&= w_i \times \mathrm{sigmoid}\big(  \frac{1}{\hat{c}_{0i}} - \frac{1}{\sqrt{d} - \hat{c}_{0i}} \big), 
\end{align}
where $\hat{c}_{0i} = \| u_c\opt - v_i \|_2$.
We can observe that this surrogate $\wh p_i$ does not depend on $\kappa_0$. Moreover, this surrogate probability is computed based on the distance from the item embedding to the incumbent Chebyshev center $u_c\opt$, but not to the true value of the user's embedding $u_0$. Our method is also robust to the misspecification of the functional form. The numerical section also shows that $\hat p_i$ can be calculated using the $\mathrm{tanh}$ function instead of the $\mathrm{sigmoid}$ function. 

Conditioned that the user has prior experience with item $v_i$, there now exist three situations:
\begin{itemize}[leftmargin=5mm]
\item Case 1: if the item $v_i$ satisfies 
\[
\| v_i - u_c\opt \|_2 \le \max_{v_j \in \mc L^+_t} \| v_j - u_c\opt \|_2,
\]
then by Assumption~\ref{a:consistency}\ref{a:consistency:+} it is likely that the user will rate item $v_i$ positively ($+1$) or $\NA$. If we exercise optimism, we expect the user to rate positively ($+1$). In this optimistic case, this positive rating from the user will lead to subsequently $| \mc L^-_t|$ new preferences of the form $v_i \succsim v_j$ for all $v_j \in \mc L^-_t$. Each pairwise preference is represented by a linear cut $2 (v_j - v_i)^\top u \le  \|v_j\|_2^2 - \|v_i\|_2^2$.
The degree to which the above cut can effectively reduce the size of the set $\mc U_{\mbb P}$ is quantified by the distance from the Chebyshev center $u_c\opt$ to the hyperplane $2 u ^\top (v_j - v_i) =  \|v_j\|_2^2 - \|v_i\|_2^2$. An elementary calculation shows that this distance has an analytical form
\[
\frac{|2 (v_j - v_i)^\top u_c\opt + \|v_i\|^2_2 - \|v_j\|^2_2|}{\|2 (v_j - v_i)\|_2}.
\]
As a consequence, if we decide to sum up these distances, then the total distance from the Chebyshev center $u_c\opt$ to all hyperplanes generated by the positive feedback on item $v_i \in \mc V_t$ is
\[
    \sum_{v_j \in \mc L^-_t}~\frac{|2 (v_j - v_i)^\top u_c\opt + \|v_i\|^2_2 - \|v_j\|^2_2|}{\|2 (v_j - v_i)\|_2} \Let q_i^+.
\]
\item Case 2: if the item $v_i$ satisfies 
\[
\| v_i - u_c\opt \|_2 \ge \min_{v_j \in \mc L^-_t} \| v_j - u_c\opt \|_2,\]
then by Assumption~\ref{a:consistency}\ref{a:consistency:-}, it is likely that the user will rate $v_i$ negatively ($-1$) or $\NA$. A parallel argument can quantify the total distance in this case:
\[
\sum_{v_j \in \mc L^+_t}  \frac{|2 (v_i - v_j)^\top u_c\opt + \|v_j\|^2_2 - \|v_i\|^2_2|}{\|2 (v_i - v_j)\|_2} \Let q_i^-.
\]
\item Case 3: if item $v_i$ does not satisfy the above conditions, then we have high uncertainty about the user's response for $v_i$. Nevertheless, if we opt for optimism, we can use the minimum of the two distances:
    $\min\left\{q_i^+,q_i^- \right\} \Let q_i^{\NA}$.
\end{itemize}
Our goal is to choose the next items that maximize the probability of user experience for each chosen item while minimizing the distance from the resulting cut to the center in all three cases mentioned. Consequently, we can determine the next item to query by finding the equation below:
    \begin{align*}
    &\min_{v_i \in \mc V_t}~ (1-\hat p_i) \Big[ q_i^+ \mathbb{I}^+(v_i) + q_i^- \mathbb{I}^-(v_i) +  \\
    &\hspace{3cm} q_i^{\NA} (1 - \mathbb{I}^+(v_i)) (1 - \mathbb{I}^-(v_i)) \Big],
    \end{align*}
    where $\mathbb{I}^+$ is the indicator function for Case 1 above:
    \[
    \mathbb{I}^+(v_i) = \begin{cases}
    1 & \text{if } \| v_i - u_c\opt \|_2 \le \max_{v_j \in \mc L^+_t} \| v_j - u_c\opt \|_2, \\
    0 & \text{otherwise,}
    \end{cases}
    \]
    and $\mathbb{I}^-$ is the indicator for Case 2:
    \[
    \mathbb{I}^-(v_i) = \begin{cases}
    1 & \text{if } \| v_i - u_c\opt \|_2 \ge \min_{v_j \in \mc L^-_t} \| v_j - u_c\opt \|_2, \\
    0 & \text{otherwise.}
    \end{cases}
    \]    
Notice that these indicator functions depend on the current Chebyshev center $u_c\opt$ as well as the current set of positively-rated items $\mc L_t^+$ and negatively-rated items $\mc L_t^-$; however, these dependencies are omitted to avoid clutter. 

To enhance understanding of our process for identifying $\mc U_{\mbb P}$, we create and visualize a toy example comprising a single user and five items in the supplementary. 


\textbf{Aggregating Chebyshev centers using a reweighting scheme.}
Suppose that in the ``burn-in'' questionnaire, we have asked $K$ items. In sequential Q\&A, suppose at round $t$, we obtain a Chebyshev center $u_c^{t}$ by solving the optimization problem in Theorem~\ref{thm:chebyshev}. Then, the aggregated center after the adaptive Q\&A process (after $T$ rounds) can be computed using a reweighting scheme:
\be \label{eq:agg_cheb}
    \bar u_c\opt = \sum_{t=0}^T \frac{K + t \times m}{K \times (T + 1) + T \times (T + 1) \times m / 2} u_{c}^t.
\ee
The denominator is a normalizing constant so that the weights sum up to one.

\textbf{Item recommendation.} At any time, our system keeps track of three sets of items: $\mc L^+$, $\mc L^-$, and $\mc L^\NA$. We generate all valid pairwise preferences by coupling items from the $\mc L^+$ and $\mc L^-$ sets. Each preference pair delineates a distinct cut in the embedding region, effectively narrowing down the area denoted by the set $\mc U_{\mbb P}$ in the embedding space containing the new user embedding. To generate item recommendations, we calculate the Euclidean distance from all unqueried items to the aggregated center in~\eqref{eq:agg_cheb} and recommend the top $k$ items nearest to this center.


\section{Numerical Experiments} \label{sec:exp}
We conduct extensive experiments to study the efficacy of our proposed approach. We conduct two experiments to address the following research questions:
\begin{enumerate}[leftmargin=10mm,start=1,label={\bfseries RQ\arabic*:}]
    \item Can our algorithm accurately approximate the embedding region that contains the new user's embeddings $u_0$ with minimal information?
    \item How does our item-selection mechanism proposed in Section~\ref{sec:solution} compare to baselines?
    \item How does our proposed method generalize to different types of embedding techniques and functional forms in estimating experience probability?
\end{enumerate}

\subsection{Experiment Settings}
\textbf{Datasets and User-Item Embedding.} In our experiments, we utilize two datasets: Gowalla and Amazon-Books. Both Amazon-Books and Gowalla datasets are standard benchmarks in the recommendation system literature. Many recent published papers use these two datasets, including~\citet{ref:silva2023user, ref:gong2023full}. To process these datasets, we adhere to the pipeline in the LightGCN~\citep{ref:he2020lightgcn} and biVAE~\citep{ref:truong2021bilateral}. The embedding for users and items can be obtained using any collaborative filtering method, e.g., LightGCN~\citep{ref:he2020lightgcn} or biVAE~\citep{ref:truong2021bilateral}. Because of the lack of real data for the entire preference elicitation process in the cold-start recommendation problem, we consider the embedding produced by the collaborator filtering methods for new users (detailed in the next section) as the ``true'' embedding and conceal them from our algorithm. Our settings are still appropriate because we ensure that the recommender system does \textit{not} have access to the ground truth embedding of the new user, and the system \textit{only} has access to the user's behavior on the questionnaires. We design an experiment to see how well our algorithm approximates this ``true'' embedding for new users after a fixed number of questions. Note that the comparison of the collaborative filtering frameworks is beyond the scope of this paper. Additional information about these datasets and embedding generation can be found in the supplementary.

\textbf{New user's characteristics generation. }
A generated user possesses four attributes: a user embedding denoted as $u_0$, an $N$-dimensional binary vector indicating whether the user has experienced each item, a list of liked items, and a list of disliked items. Generating a new user begins by obtaining the user embedding outlined in the above section. Then, based on the available data, we calculate the maximum likelihood estimate $\hat{\kappa}_{\mathrm{MLE}}$, as detailed in supplementary. This estimation allows us to determine the user experience probability $p_i$ for each item as assumed in~\eqref{eq:prob}. To ascertain whether the new user has experienced a particular item, we generate a binary variable $z \in \{0,1\}^N$ for each item using a Bernoulli distribution. If $z_i$ equals 0, the user has not experienced the item $i$ ($\NA$). Conversely, if $z_i$ equals 1, the user has previously experienced the item $i$. Additionally, we retrieve the top $k$ items closest to $u_0$ and append them to the list of liked items. Let $N_e$ be the number of items experienced by the user. If the user has only experienced $N_e$ items, the remaining $N_e-k$ items are considered to be disliked items for this user.


\textbf{Setup.} We employ DPP to curate a diverse set of items ($K=50$ items) for inquiries directed at newly registered users. Supporting evidence demonstrating the superior efficiency of DPP compared to competing methods such as greedy and random generation is included in the supplementary material. In the sequential Q\&A phase, we present the user with $T = 5$ questionnaires; each contains $m = 10$ items.
 
\textbf{Baselines.} We compare our proposed method PERE (Personalized Embedding Region Elicitation) against six baselines: DPP, Conditional DPP (c-DPP), RMV~\citep{ref:fonarev2016efficient}, DPE~\citep{ref:parapar2021diverse}, PEO~\citep{ref:sepliarskaia2018preference} and DRE~\cite{ref:kweon2020deep}. c-DPP is a modified version of DPP that selects $K$ items from the remaining un-queried items. We note that for a fair comparison, we must compare our proposed method against other cold-start recommendation methods with preference elicitation. Our chosen baselines are the most recent methods in that line of research work.

\textbf{Performance Metric.} 
To assess different approaches, we employ several metrics. These include NDCG@$k$ (Normalized Discounted Cumulative Gain), which evaluates relevance and ranking simultaneously; MAP (Mean Average Precision), providing an aggregate measure of precision; and MRR (Mean Reciprocal Rank), indicating promptness in presenting relevant items. These metrics collectively estimate the recommendation system's accuracy, relevance, ranking quality, and user satisfaction.

\begin{table}[htb]
% \vspace{-7mm}
\centering
\caption{Benchmark of performance metrics on Amazon-Books (user and item embeddings produced by biVAE). Larger values are better. The best performance for any fixed number of questions is highlighted in bold. Sequential Setting contains $50+10+10+10+10+10$ items.}
\label{tab:amazon_books_bivae}
\footnotesize	
% \resizebox{\textwidth}{!}{

\pgfplotstabletypeset[
    col sep=comma,
    string type,
    every head row/.style={before row=\toprule,after row=\midrule},
    % every row no 0/.style={after row=\midrule},
    every row no 3/.style={after row=\midrule},
    % every row no 11/.style={after row=\midrule},
    every last row/.style={after row=\bottomrule},
    columns/data/.style={column name=Method, column type={l}},
    columns/data/.style={column name=HR@1, column type={l}},
    columns/data/.style={column name=AUC@10, column type={l}},
    columns/data/.style={column name=NDCG@10, column type={l}},
    columns/data/.style={column name=NDCG@30, column type={l}},
    columns/data/.style={column name=MAP, column type={l}},
    columns/data/.style={column name=MRR, column type={l}},
]{tables/Phase2_amazon_books_bivae.csv}
\end{table}

\begin{table}[htb]
% \vspace{-7mm}
\centering
\caption{Benchmark of performance metrics on Amazon-Books (user and item embeddings produced by LightGCN). Larger values are better. The best performance for any fixed number of questions is highlighted in bold. Sequential Setting contains $50+10+10+10+10+10$ items.}
\label{tab:amazon_books_lightgcn}
\footnotesize	
% \resizebox{\textwidth}{!}{
\pgfplotstabletypeset[
    col sep=comma,
    string type,
    every head row/.style={before row=\toprule,after row=\midrule},
    % every row no 0/.style={after row=\midrule},
    every row no 3/.style={after row=\midrule},
    % every row no 11/.style={after row=\midrule},
    every last row/.style={after row=\bottomrule},
    columns/data/.style={column name=Method, column type={l}},
    columns/data/.style={column name=HR@1, column type={l}},
    columns/data/.style={column name=AUC@10, column type={l}},
    columns/data/.style={column name=NDCG@10, column type={l}},
    columns/data/.style={column name=NDCG@30, column type={l}},
    columns/data/.style={column name=MAP, column type={l}},
    columns/data/.style={column name=MRR, column type={l}},
]{tables/Phase2_amazon_books_lightgcn.csv}
\end{table}

\begin{table}[htb]
% \vspace{-7mm}
\centering
\caption{Benchmark of performance metrics on Gowalla (user and item embeddings produced by LightGCN). Larger values are better. The best performance for any fixed number of questions is highlighted in bold. Sequential Setting contains $50+10+10+10+10+10$ items.}
\label{tab:gowalla_lightgcn}
\footnotesize
% \resizebox{\textwidth}{!}{
\pgfplotstabletypeset[
    col sep=comma,
    string type,
    every head row/.style={before row=\toprule,after row=\midrule},
    % every row no 0/.style={after row=\midrule},
    every row no 3/.style={after row=\midrule},
    % every row no 11/.style={after row=\midrule},
    every last row/.style={after row=\bottomrule},
    columns/data/.style={column name=Method, column type={l}},
    columns/data/.style={column name=HR@1, column type={l}},
    columns/data/.style={column name=AUC@10, column type={l}},
    columns/data/.style={column name=NDCG@10, column type={l}},
    columns/data/.style={column name=NDCG@30, column type={l}},
    columns/data/.style={column name=MAP, column type={l}},
    columns/data/.style={column name=MRR, column type={l}},
]{tables/Phase2_gowalla_lightgcn.csv}
\end{table}


\begin{table*}[htb]
% \vspace{-7mm}
\centering
\caption{Comparing sigmoid and tanh in estimating equation~\eqref{eq:hat_p}: our method is robust to  the functional misspecification.}
\label{tab:robustness}
\footnotesize	
% \resizebox{\textwidth}{!}{
\pgfplotstabletypeset[
    col sep=comma,
    string type,
    every head row/.style={before row=\toprule,after row=\midrule},
    every row no 1/.style={after row=\midrule},
    every row no 3/.style={after row=\midrule},
    % every row no 11/.style={after row=\midrule},
    every last row/.style={after row=\bottomrule},
    columns/data/.style={column name=Method, column type={l}},
    columns/data/.style={column name=HR@1, column type={l}},
    columns/data/.style={column name=AUC@10, column type={l}},
    columns/data/.style={column name=NDCG@10, column type={l}},
    columns/data/.style={column name=NDCG@30, column type={l}},
    columns/data/.style={column name=MAP, column type={l}},
    columns/data/.style={column name=MRR, column type={l}},
]{tables/robust_tanh_sigmoid.csv}
\end{table*}

\subsection{Numerical Results and Discussion}

The numerical results on different datasets and embeddings are summarised into Tables~\ref{tab:amazon_books_bivae},~\ref{tab:amazon_books_lightgcn}, and~\ref{tab:gowalla_lightgcn}. Due to space limitations, we report experimental results with more performance metrics in the supplementary.

\textbf{Recommendation quality.}  The results indicate that our method is the most effective method for constructing a personalized series of follow-up questions for new users. Building upon the success of DPP, the best-performing method in the ``burn-in'' phase, PERE exhibits the most significant improvement in quality after 50 items have been asked. This is significant due to the small experience probability of items, as defined in~\eqref{eq:prob}. Despite the limited information users provide, our framework successfully enhances the quality of recommendations based on this minimal input. Addressing any potential question about this enhancement stemming solely from the sequential nature of our framework, we also conduct comparisons with other sequential methodologies, such as bandit (DPE), conditional DPP, and active learning (PEO). The results conclusively demonstrate that even when evaluated among sequential methods, PERE remains the top performer. 

\textbf{Generalizability.} Comparing Tables~\ref{tab:amazon_books_bivae} and~\ref{tab:amazon_books_lightgcn}, we observe that our method PERE efficiently generalizes to multiple types of embedding generation techniques, in our case, LightGCN (trained with implicit user response) and biVAE (trained with explicit user response). 

\textbf{Robustness with functional misspecification.} To evaluate the robustness against misspecification of the functional form in estimating experience probability, we devise an experiment utilizing both the sigmoid and tanh functions in equation~\eqref{eq:hat_p}. Table~\ref{tab:robustness} highlights that our method consistently upholds recommendation quality despite replacing the sigmoid with the tanh function.

\textbf{Real user experiments.} We design additional offline experiments using real user data accumulated in datasets such as MovieLens 10M, MovieLens 20M, and Amazon Books. We focused on the Amazon Books, MovieLens 10M, and MovieLens 20M datasets because they are widely used in the recommender systems literature and provide a diverse set of user-item interactions across different domains. These datasets are comparable in scale and complexity to those used in the related work section, making them suitable for evaluating the generalizability of our framework.

The items users prefer are based on their actual ratings from the datasets rather than being generated from the embeddings. We followed the data processing approach used in the Deep rating elicitation~\citep{ref:kweon2020deep}, which is also one of our baselines. Specifically, we filtered out users who rated over $40$ items and converted implicit ratings $(1-5)$ to explicit ratings ($0$ and $1$) as follows: ratings of $4$ and $5$ are considered as liked items, while ratings of $0$, $1$, and $3$ are considered as disliked items. Table~\ref{tab:real-user} demonstrates that our method outperforms baselines in all three datasets.

\begin{table*}[htb]
    \centering
    \caption{Real user experiments on three datasets.}
    \begin{tabular}{lccccccc}
    \hline
        Datasets & Methods & NDCG@10 & MAP & MRR \\ \hline
        MoviesLens-10M  &  c-DPP  &	0.802 &	0.591 &	0.772\\ 
             &  DPE & 0.674 & 0.439 & 0.607\\
             &  RMV & 0.492 & 0.281 & 0.437 \\ 
             &  PEO & 0.667 & 0.429 & 0.69 \\ 
             &  DRE & 0.337 & 0.164 & 0.271 \\
             & PERE & \textbf{0.812} & \textbf{0.603} & \textbf{0.784} \\\hline
        MoviesLens-20M  &  c-DPP  &	0.628 & 0.499 & 0.689\\ 
             &  DPE & 0.634 & 0.428 & 0.578\\
             &  RMV & 0.144 & 0.088 & 0.133 \\ 
             &  PEO & 0.639 & 0.394 & 0.592 \\ 
             &  DRE & 0.435 & 0.227 & 0.364 \\
             & PERE & \textbf{0.734} & \textbf{0.505} & \textbf{0.696} \\\hline
        Amazon-Books  &  c-DPP  &	0.127 &	0.101 & 0.108\\ 
             &  DPE & 0.082 & 0.078 & 0.084\\
             &  RMV & 0.044 & 0.041 & 0.047 \\ 
             &  PEO & 0.099 & 0.084 & 0.105 \\ 
             &  DRE & 0.029 & 0.025 & 0.027 \\
             & PERE & \textbf{0.132} & \textbf{0.106} & \textbf{0.125} \\\hline
    \end{tabular}
    \label{tab:real-user}
\end{table*}

\textbf{Inconsistent preference.} We conduct two additional experiments to evaluate our method's performance under inconsistent user preferences: 
\begin{itemize}
    \item \textbf{Experiment 1:} We introduce a probability $\tau$ that a user's response to an experienced item will be flipped. When $\tau = 0$, there is no inconsistency, and when $\tau = 1$, responses are always inconsistent. We plot the performance gain in NDCG@50 against the number of displayed items for different values of $\tau$. Figure~\ref{fig:inconsist-ndcg} shows that as $\tau$ increases, the performance gain decreases but remains positive, demonstrating that our method still provides benefits despite inconsistencies in user responses.
    \item \textbf{Experiment 2:} We compare our method against DPE and RMV in the presence of inconsistency. Table~\ref{tab:inconsist-dpe} shows that our method maintains its advantage over the baselines even with inconsistent user preferences.
\end{itemize}

\begin{figure}[!htb]
    \centering
    \includegraphics[width=0.9\linewidth]{figures/Inconsist_NDCG_50.png}
    \caption{As the value of $\kappa_0$ increases, NDCG@50 increases under inconsistent preference setting.}
    \label{fig:inconsist-ndcg}
\end{figure}

\begin{table*}[htb]
    \centering
    \caption{Comparison against DPE and RMV under inconsistent preference setting.}
    \begin{tabular}{lccccccc}
    \hline
        Datasets & Methods & HR@5 & NDCG@10 &	MRR \\ \hline
        Amazon-Books ($\tau=0.1$)  &  DPE & 0.305 & 0.285 & 0.293\\ 
             &  RMV & 0.205 & 0.188 & 0.179\\
             &  PERE & \textbf{0.365} & \textbf{0.303} & \textbf{0.329} \\ \hline
        Amazon-Books ($\tau=0.5$)  &  DPE & 0.310 & 0.288	& 0.297\\ 
             &  RMV & 0.205 & 0.188 & 0.179\\
             &  PERE & \textbf{0.360} & \textbf{0.329} & \textbf{0.328} \\ \hline
    \end{tabular}
    \label{tab:inconsist-dpe}
\end{table*}


\textbf{Questionnaire size analysis.}
To be user-friendly, the questionnaire size should be small to avoid stressing the user's cognitive load. We find that the number of items at each round does not significantly affect the quality of the method. What is more interesting to track is the quality improvement over a long history as the \textit{total} number of questions increases. Therefore, we conduct an additional experiment to study the impact of the total number of questions on the performance metrics NDCG@$10$ and MRR. Figure~\ref{fig:vary-K} shows that our method outperforms RMV, DPE, PEO, and DRE in all datasets. Additionally, our method outperforms c-DPP when eliciting $K=100$ items in total.

\begin{figure*}[htb]
    \centering
    \includegraphics[width=0.8\linewidth]{figures/vary_K.png}
    \caption{Performance improvements with the dynamic questionnaire size on Amazon-Books and Gowalla datasets.}
    \label{fig:vary-K}
\end{figure*}

\textbf{Runtime comparison.} We report the run-time experiments on our largest dataset, Amazon-Books: The average run time per round is approximately 0.27 seconds. We believe this runtime is reasonable for real-time systems if we optimize the hardware-software for deployment.



\section{Related Works} \label{sec:related}
%In this section, we will provide a brief overview of several closely related lines of work to our work. Specifically, we will discuss the cold-start recommendation and rating elicitation methods for recommendation.% \begin{figure}
%     \centering
%     \includegraphics{}
%     \caption{Caption}
%     \label{fig:my_label}
% \end{figure}
\textbf{Cold-start recommendation.} The cold-start problem presents a significant challenge within recommender systems. This challenge emerges from the sparsity of information necessary to personalize recommendations for users effectively. In most cases, users and items have limited or no interactions. Several approaches have been proposed to tackle the cold-start problem for recommender systems~\citep{ref:rajapakse2022fast, ref:guo2020survey, ref:camacho2018social}. A possible solution for tackling the cold-start problem is to employ collaborative filtering techniques~\citep{ref:natarajan2020resolving, ref:wei2020fast, ref:anwar2022collaborative}. For instance,~\citet{ref:son2017content} introduced a hybrid approach that combines collaborative filtering with content-based methods to mitigate the cold-start problem. 

Deep learning techniques have been employed to learn representations or embeddings that capture the latent features of users and items to handle the cold-start problem~\citep{ref:tao2022sminet, ref:raziperchikolaei2021shared, ref:chu2023meta, ref:yu2021personalized, ref:zheng2021cold}. Recently, graph-based recommendation techniques have become effective approaches for learning user and item representations~\citep{ref:ying2018graph, ref:salha2021cold}. These methods leverage the user-item interaction graphs to infer user preferences. For example,~\citet{ref:ying2018graph} develops a graph autoencoder framework to learn the node representation. This approach empirically shows competitive performance under real-world scenarios.

% \subsubsection{Cold start user}
% \paragraph{Fast and Accurate User Cold-Start Learning Using Monte Carlo Tree Search} \cite{ref:dilina2022fast}.
% \begin{itemize}
%     \item Assume each user belongs to a group; when a new user arrives, estimate the probability of which group the new user belongs to. 
%     \item Ask new user maximum of $t_{\text{max}}$ questions (Rate this item), build a look ahead tree based on user's responses.
%     \item Calculate the reward of a path in the tree (sequence of items) using synthetic ratings based on the new user's estimated group as mentioned in section 3.2. More specifically, given user group $\hat g$, they generate synthetic rating  by drawing from a multivariate Gaussian distribution $\mu(\hat g, \nu), \sigma(\hat g, \nu)$, where $\nu \in \mc V$ is the true rating of user from group $\hat g$.

%     \item \cite{ref:huang2022learning}...

%     \item learning about user: \cite{ref:banovic2018warming}
% \end{itemize}
% \subsubsection{Cold Start Item}
% \paragraph{Constrastive Learning for Cold Start Recommendation} \url{https://dl.acm.org/doi/pdf/10.1145/3474085.3475665}

\textbf{Rating elicitation.} Rating elicitation plays a crucial role in recommender systems, as it involves gathering explicit user feedback to understand their preferences. Rating elicitation refers to a Q\&A process employed by a system to request new users to rate a set of items. This process aims to infer the users' preferences and enhance the quality of the recommendations. The primary challenge in rating elicitation lies in selecting the seed items that can effectively capture the new users' preferences. 

One of the first approaches to solving rating elicitation is Active Collaborative Filtering (CF). Most Active CF methods ask users to rate the set of items that maximize the Expected Value of Information~\cite{ref:boutilier2002active, ref:harpale2008personalized}, information gain~\cite{ref:canal2019active, ref:rashid2002getting, ref:houlsby2014cold}, influence criterion~\citet{ref:rubens2007influence} or minimize the estimated model Entropy~\cite{ref:jin2012bayesian, ref:houlsby2012collaborative}. However, those methods rely on the current estimated model, which is obtained via a few user's warm-start ratings instead of a completely cold-start user setting.

Notably, region refining methods closely resemble our work. For example,~\citet{ref:iyengar2001evaluating} proposes Q-Eval, a preference elicitation method that iteratively refines a permissible region over the weights of multiple item attributes. Another method~\citet{ref:toubia2004polyhedral} involves selecting questions by adding cuts to narrow down the feasible region defined by a polyhedron. However, these methods consider a lower dimensional space compared to our work, which involves a higher-dimensional embedding space. Additionally, our question selection criteria are more complex, combining diversity and information gain maximization.

Recently, rating elicitation has emerged as a powerful method~\citep{ref:gope2017survey, ref:pu2012evaluating} to tackle the cold-start problem in recommender systems. For instance,~\citet{ref:kalloori2018eliciting} proposed an active learning method for pairwise items and a personalized ranking algorithm to increase user satisfaction.~\citet{ref:parapar2021diverse} employed multi-armed bandits, a well-established exploration-exploitation framework from reinforcement learning, to diversify the preferences elicited by the recommendation model. However, in real-world settings, these approaches rely on a fundamental assumption that users will consistently provide feedback, regardless of whether they have experienced the item or not. Nonetheless, this assumption may not hold true in practice. In this work, we address this problem by proposing a novel behavior model for the user and a preference elicitation process that directly takes the experience probability into consideration.


\section{Conclusion} \label{sec:conclusion}
In this paper, we have addressed the problem of cold-start recommendation by proposing a personalized elicitation scheme consisting of two phases. After a short ``burn-in'' phase, we employ an adaptive preference approach where users are sequentially prompted to rate items that refine their preferences and user representation. Throughout the process, the system represents the user's preferences as a region estimate rather than a single point, capturing the uncertainty in their preferences. The value of information gained from user ratings is quantified by considering the distance from the region center that confidently contains the true embedding value. Recommendations are generated by considering the user's preferences region. We have demonstrated the efficiency of each subproblem in the elicitation scheme and conducted empirical evaluations on prominent datasets to showcase the effectiveness of our proposed method compared to existing rating-elicitation approaches.

%\newpage

\textbf{Acknowledgments.} Viet Anh Nguyen gratefully acknowledges the generous support from the CUHK’s Improvement on Competitiveness in Hiring New Faculties Funding Scheme and the CUHK's Direct Grant Project Number 4055191. %We would also like to thank Area Chair MzYZ for pointing out relevant citations.

\bibliography{bibliography}

\newpage
\onecolumn

\title{Supplementary Material for Paper: Cold-start Recommendation \\ by Personalized Embedding Region Elicitation}
\maketitle

\appendix
\section{Proof of Theorem 1} 
We here provide the proof of Theorem~\ref{thm:chebyshev} that are omitted in the main text.
\begin{proof}
    The optimization problem to find the Chebyshev center and its radius can be rewritten as
    \[
        \begin{array}{cl}
        \max & r \\
         \st & 2 (u_{c} + \delta)^\top (v_j - v_i) \le  \|v_j\|_2^2 - \|v_i\|_2^2 \\
         & \hspace{3cm} \forall \delta \in \mc B_r,~\forall v_i \succsim v_j \in \mbb P \\
            & u_c \in \mbb H,~r \in \R_+,
        \end{array}
    \]
    where $\mc B_r = \{\delta \in \R^d: \| \delta \|_2 \le r\}$ is a $d$-dimensional Euclidean ball of radius $r$. Pick any preference $v_i \succsim v_j \in \mbb P$, the semi-infinite constraint 
    \[
        2 (u_{c} + \delta)^\top (v_j - v_i) \le \|v_j\|_2^2 - \|v_i\|_2^2 ~\forall \delta \in \mc B_r
    \]
    is equivalent to the robust constraint
    \[
    2 u_{c}^\top (v_j - v_i) + 2 \sup_{\|\delta\|_2 \le r} \delta^\top (v_j - v_i) \le \|v_j\|_2^2 - \|v_i\|_2^2.
    \]
    Because the Euclidean norm is a self-dual norm, we have
    \[
        \sup_{\|\delta\|_2 \le r} \delta^\top (v_j - v_i) = r \|v_j - v_i\|_2.
    \]
    Substituting the above relationship to the optimization problem completes the proof.
\end{proof}

\section{Further Explanations about Settings and Region Elicitation}

In Assumption~\ref{a:exp-prob} , the probability that an user $u_0$ has experienced an item $v_i$ is given by
\[
p_{0i} \Let w_i \times \mathrm{sigmoid}\big( \frac{1}{c_{0i}} - \frac{\kappa_0}{\sqrt{d} - c_{0i}}  \big), 
\]
where $c_{0i} = \| u_0 - v_i \|_2$ is the distance between the true user's and the item's embedding. In Figure~\ref{fig:kappa-plot}, we visualize the dependence of $p_{0i}$ on the parameter $\kappa_0$. For a fixed value of the distance $c_{0i}$, the experience probability $p_{0i}$ decreases  monotonically in $\kappa_0$. 

Next, in a toy 2D example, we visualize the region $\mathcal{U}_{\mathbb{P}}$ in Figure~\ref{fig:app-chebyshev}. Initially, a new user (red star) came into our system, but we are unaware of its true embedding location. After two steps of elicitation, it is evident that the Chebyshev center moves progressively closer to the 'True User' embedding, underscoring the success of our proposed method in predicting user embeddings.

\begin{figure}[!htb]
    \centering
    \includegraphics[width=0.8\linewidth]{figures/kappa.pdf}
    \caption{As the value of $\kappa_0$ increases, the probability that the user has prior experience (see Assumption~\ref{a:exp-prob}) with an item is dampened. Plot with $d = 64$ and the maximal value of $c_{0i}$ is $\sqrt{d} = 8$.}
    \label{fig:kappa-plot}
\end{figure}

\begin{figure*}[!ht]
    \centering
    \includegraphics[width=1.0\textwidth]{figures/method.pdf}
    \caption{Illustration of our method in 2D toy example: Recall that a cut in the embedding space is created by pairing a positive item with a negative item. At time $t=0$, when no questions have been asked, there are no cuts in the embedding space. Moving to time $t=1$, we asked the user to elicit items 1, 2, and 4, and the user-specified `dislike,' `like,' and `dislike' for each respective item. This introduces two cuts in the space, and the initial Chebyshev center is calculated. Progressing to time $t=3$, we ask the user to elicit item 5 and determine it to be a disliked item. As a result, a final cut is constructed by pairing item 2 with item 5. This process concludes with the finalization of region $\mathcal{U}_{\mathbb{P}}$}
    \label{fig:app-chebyshev}
\end{figure*}



\section{Cold-Starting Query List via Determinantal Point Processes}\label{sec:app-dpp}

The main task of the ``burn-in'' Phase is to create a list, denoted as $\mc L$, comprising $K$ popular items for querying the new user. If a user has no previous experience with an item $v_i$, they will indicate $\NA$ for that particular item. This $\NA$ response is uninformative because item $v_i$ does not lead to any pair of preferences being added to the preference list $\mbb P$ as by the rule of preference construction. Therefore, when constructing the cold-start item list $\mc L$, it is important to consider the probability that a user has prior experience with the items. By Assumption~\ref{a:exp-prob}, this probability is affected by two elements: the popularity of the item and the distance from the true user embedding $u_0$ to the item embedding $v_i$.

Since we do not know the user embedding $u_0$, but we have information about the popularity of the items, we thus leverage this popularity information in the construction of $\mc L$. This line of argument also justifies the construction of the list $\mc L$ that contains only the most popular items from the list of all possible items. To find this list $\mc L$, we can use a simple weighted $K$-medoids method: given a list of $N$ items; the weighted $K$-medoids return a subset of $K$ items to be used as cluster centers. The weighted $K$-medoids problem aims to minimize the total weighted squared Euclidean distance from the item embeddings to the nearest centers.

We present in this section a determinantal point process (DPP) to construct the item list $\mc L$. We aim to find a set of items that can balance the diversity and popularity of items oblivious to the user's true embedding. DPPs are elegant probabilistic models of global, negative correlations, and they admit efficient algorithms for sampling, marginalization, conditioning, and other inference tasks~\citep{ref:kulesza2012determinantal}. DPPs have been applied in various machine learning tasks, including document summarization~\citep{ref:perez2021multi} and image search~\citep{ref:chao2015large}. We rely on the following $L$-ensemble definition of DPP.

\begin{definition}[$L$-ensemble DPP] \label{def:dpp-L}
    Given a positive semidefinite $P$-by-$P$ matrix $L \in \PSD^P$, an $L$-ensemble DPP is a distribution over all $2^P$ index subsets $J \subseteq \{1, \ldots, P\}$ such that
\[\mathrm{Prob}(J) = \det(L_J)/ \det(I + L),\]
where $L_J$ denotes the $|J|$-by-$|J|$ submatrix of $L$ with rows and columns indexed by $J$.
\end{definition}

We design the matrix $L$ that can balance the diversity and popularity of items. We compose $L$ as the sum of a similarity matrix $S$ and a popularity matrix $D$ among items:
\[
    L = S +  D, \quad \text{where} \quad D = \mathrm{diag}(w_i).
\]
The matrix $D$ is diagonal, and its diagonal elements capture the popularity of the items. A possible choice for the similarity matrix $S$ is $S=V^\top V \in \PSD^P$ where $V$ is the embedding matrix of the popular items.  Because both $S$ and $D$ are positive semidefinite, the ensemble matrix $L$ is also positive semidefinite.

We then find the combination of top-$K$ items that fit with the construction of the cold-start querying list by solving the following problem
    \be \label{eq:det}
        \max \left\{ \det ( L_z) ~:~ z \in \{0, 1\}^P,~ \| z \|_0= K \right\},
    \ee
where $L_z$ is a submatrix of $L$ restricted to rows and columns indexed by the one-components of $z$. It is well-known that the solution to problem~\eqref{eq:det} coincides with the MAP estimate of the DPP with a cardinality constraint~\citep{ref:kulesza2012determinantal}.  Further, it is crucial to highlight that problem~\eqref{eq:det} is a submodular maximization problem since the log-probability function $\log \det(L_z)$ is a submodular function~\citep{ref:gillenwater2012near}. Further, this problem is well-known to be NP-hard~\citep{ref:kulesza2012determinantal}, and thus it is notoriously challenging to solve~\eqref{eq:det} to optimality.~\citet{ref:chen2018fast} provides a greedy algorithm for the MAP estimation problem. The aforementioned greedy algorithm has been proven to achieve an approximation ratio of $\mc O(\frac{1}{k!})$~\citep{ref:civril2009selecting} and incur a computational complexity of $\mc O(K^2P)$. Moreover, to improve the solution quality, we introduce a 2-neighborhood local search strategy. This method involves an iterative process of exchanging one element from the current set with one element from the complementary set, continuing until no additional improvement can be achieved.


\section{Maximum Likelihood Estimation of the Tolerance Parameter}
\label{sec:app-estimation}

We provide the maximum likelihood estimation for the parameters $\kappa$. Without any loss of generality, we consider a training dataset consisting of $N$ items and $M$ users; the user embeddings $u_m$, and the item embeddings $v_i$ are given. The interactions between the users and the items are presented by a binary-valued data matrix $E \in \{0, 1\}^{M \times N}$ with each $E_{mi}$ admits values
    \[
    E_{mi} = \begin{cases}
        1 & \text{if user $m$ has an experience with item $i$}, \\
        0 & \text{otherwise.}
    \end{cases}
    \]
Suppose that there exists a global constant $\kappa \in \R_+$ such that $E_{mi}$ follows a Bernoulli random variable with
\[
\mathrm{Prob}( E_{mi} = 1 ) = w_i \times \mathrm{sigmoid}\big( \frac{1}{c_{mi}} - \frac{\kappa}{\sqrt{d} - c_{mi}}  \big),\]
where $c_{mi}$ is the embedding distance between the user the the item $c_{mi} = \| u_m - v_i \|_2$.
Given the data matrix $E$ and suppose that the elements $E_{mi}$ are jointly independent, the likelihood is
    \[
    L(\kappa | E) = \prod_{m=1}^M \prod_{i=1}^N \left( p_{mi} (\kappa)  \right)^{E_{mi}} \left( 1 - p_{mi}(\kappa)\right)^{1 - E_{mi}},
    \]
    where $p_{mi}(\kappa)$ is
    \[ 
    p_{mi}(\kappa) = \frac{w_i}{1 + \exp \big( \frac{\kappa}{\sqrt{d} - c_{mi}} - \frac{1}{c_{mi}} \big)}.
    \]
    The estimate $\hat \kappa_{\mathrm{MLE}}$ minimizes the negative log-likelihood:
    \begin{align*}
        &\min_{\kappa \ge 0}~\sum_{m=1}^M \sum_{i=1}^N   \log \left( 1 + \exp \big( \frac{\kappa}{\sqrt{d} - c_{mi}} - \frac{1}{c_{mi}} \big)\right) \\
        &  - 
        \sum_{m=1}^M \sum_{i=1}^N (1 - E_{mi}) \log \left( 1 + \exp \big( \frac{\kappa}{\sqrt{d} - c_{mi}} - \frac{1}{c_{mi}}   \big) - w_i\right),
    \end{align*}
    which can be found by standard gradient descent algorithms.

\section{Questionnaire Design} \label{sec:app-qs-design}

Inspired by the structure of the Netflix questionnaire~\citep{ref:kweon2020deep}, we devise our questionnaire methodology to capture a comprehensive set of preference pairs while minimizing user effort. Users are provided the option to skip specifying preferences, streamlining the process. In our questionnaire, users are presented with a product display, and while scrolling through, they only need to indicate `like' or `dislike' for products they are familiar with. An illustration of the questionnaire is provided in Figure~\ref{fig:questionnaire}. In practice, although our experimental design prompts new users to specify preferences for $100$ items, our algorithm performs effectively even when utilizing an average of around $15\%$ of user responses, evident by the user response ratio in Table~\ref{tab:like-dislike}.

\begin{figure*}[!ht]
    \centering
    \includegraphics[width=0.8\textwidth]{figures/question.pdf}
    \caption{Illustration of our questionnaire: Taking inspiration from the Netflix questionnaire as outlined in~\cite{ref:kweon2020deep}, we structure each questionnaire as depicted above. Upon a new user entering our system, we prompt them to indicate their preferences for a set of items. Users can specify `like' ($+1$), `dislike' ($-1$), or choose to skip the item ($\NA$).}
    \label{fig:questionnaire}
\end{figure*}

\begin{table*}[!ht]
% \vspace{-7mm}
\centering
\caption{Number of items responded to by users using the PERE method. The response ratio is calculated over 100 queried items.}
\label{tab:like-dislike}

\pgfplotstabletypeset[
    col sep=comma,
    string type,
    every head row/.style={before row=\toprule,after row=\midrule},
    % every row no 0/.style={after row=\midrule},
    every row no 3/.style={after row=\midrule},
    % every row no 11/.style={after row=\midrule},
    every last row/.style={after row=\bottomrule},
    columns/data/.style={column name=Method, column type={l}},
    columns/data/.style={column name=HR@1, column type={l}},
    columns/data/.style={column name=AUC@10, column type={l}},
    columns/data/.style={column name=NDCG@10, column type={l}},
    columns/data/.style={column name=NDCG@30, column type={l}},
    columns/data/.style={column name=MAP, column type={l}},
    columns/data/.style={column name=MRR, column type={l}},
]{tables/likes_NA.csv}
\end{table*}

\section{Additional numerical results} \label{sec:app-exp}
% \subsection{Real User Experiments} \label{sec:app-burn-in}
% We design additional offline experiments using real user data accumulated in datasets such as MovieLens 10M, MovieLens 20M, and Amazon Books. We focused on the Amazon Books, MovieLens 10M, and MovieLens 20M datasets because they are widely used in the recommender systems literature and provide a diverse set of user-item interactions across different domains. These datasets are comparable in scale and complexity to those used in the related work section, making them suitable for evaluating the generalizability of our framework.

% The items users prefer are based on their actual ratings from the datasets rather than being generated from the embeddings. We followed the data processing approach used in the Deep rating elicitation~\citep{ref:kweon2020deep}, which is also one of our baselines. Specifically, we filtered out users who rated over $40$ items and converted implicit ratings $(1-5)$ to explicit ratings ($0$ and $1$) as follows: ratings of $4$ and $5$ are considered as liked items, while ratings of $0$, $1$, and $3$ are considered as disliked items. Table~\ref{tab:real-user} demonstrates that our method outperforms baselines in all three datasets.

% \begin{table}[ht]
%     \centering
%     \caption{Real user experiments on three datasets.}
%     \begin{tabular}{lccccccc}
%     \hline
%         Datasets & Methods & NDCG@10 & MAP & MRR \\ \hline
%         MoviesLens-10M  &  c-DPP  &	0.802 &	0.591 &	0.772\\ 
%              &  DPE & 0.674 & 0.439 & 0.607\\
%              &  RMV & 0.492 & 0.281 & 0.437 \\ 
%              &  PEO & 0.667 & 0.429 & 0.69 \\ 
%              &  DRE & 0.337 & 0.164 & 0.271 \\
%              & PERE & \textbf{0.812} & \textbf{0.603} & \textbf{0.784} \\\hline
%         MoviesLens-20M  &  c-DPP  &	0.628 & 0.499 & 0.689\\ 
%              &  DPE & 0.634 & 0.428 & 0.578\\
%              &  RMV & 0.144 & 0.088 & 0.133 \\ 
%              &  PEO & 0.639 & 0.394 & 0.592 \\ 
%              &  DRE & 0.435 & 0.227 & 0.364 \\
%              & PERE & \textbf{0.734} & \textbf{0.505} & \textbf{0.696} \\\hline
%         Amazon-Books  &  c-DPP  &	0.127 &	0.101 & 0.108\\ 
%              &  DPE & 0.082 & 0.078 & 0.084\\
%              &  RMV & 0.044 & 0.041 & 0.047 \\ 
%              &  PEO & 0.099 & 0.084 & 0.105 \\ 
%              &  DRE & 0.029 & 0.025 & 0.027 \\
%              & PERE & \textbf{0.132} & \textbf{0.106} & \textbf{0.125} \\\hline
%     \end{tabular}
%     \label{tab:real-user}
% \end{table}

% \subsection{Inconsistent Preference} \label{sec:app-inconsistent-pref}
% We conduct two additional experiments to evaluate our method's performance under inconsistent user preferences: 
% \begin{itemize}
%     \item \textbf{Experiment 1:} We introduce a probability $\tau$ that a user's response to an experienced item will be flipped. When $\tau = 0$, there is no inconsistency, and when $\tau = 1$, responses are always inconsistent. We plot the performance gain in NDCG@50 against the number of displayed items for different values of $\tau$. Figure~\ref{fig:inconsist-ndcg} shows that as $\tau$ increases, the performance gain decreases but remains positive, demonstrating that our method still provides benefits despite inconsistencies in user responses.
%     \item \textbf{Experiment 2:} We compare our method against DPE and RMV in the presence of inconsistency. Table~\ref{tab:inconsist-dpe} shows that our method maintains its advantage over the baselines even with inconsistent user preferences.
% \end{itemize}

% \begin{figure}[!htb]
%     \centering
%     \includegraphics[width=0.8\linewidth]{figures/Inconsist_NDCG_50.png}
%     \caption{As the value of $\kappa_0$ increases, NDCG@50 increases under inconsistent preference setting.}
%     \label{fig:inconsist-ndcg}
% \end{figure}

% \begin{table}[ht]
%     \centering
%     \caption{Comparison against DPE and RMV under inconsistent preference setting.}
%     \begin{tabular}{lccccccc}
%     \hline
%         Datasets & Methods & HR@5 & NDCG@10 &	MRR \\ \hline
%         Amazon-Books ($\tau=0.1$)  &  DPE & 0.305 & 0.285 & 0.293\\ 
%              &  RMV & 0.205 & 0.188 & 0.179\\
%              &  PERE & \textbf{0.365} & \textbf{0.303} & \textbf{0.329} \\ \hline
%         Amazon-Books ($\tau=0.5$)  &  DPE & 0.310 & 0.288	& 0.297\\ 
%              &  RMV & 0.205 & 0.188 & 0.179\\
%              &  PERE & \textbf{0.360} & \textbf{0.329} & \textbf{0.328} \\ \hline
%     \end{tabular}
%     \label{tab:inconsist-dpe}
% \end{table}

\subsection{Statistical Test}
For each user, we compute the recommendation metrics for our methods and baselines. We propose to test the hypotheses:
\begin{itemize}
    \item Null hypothesis: PERE’s NDCG@10 (or MAP, MRR) equals the competing method’s NDCG@10 (or MAP, MRR) 
    \item Alternative hypothesis: PERE’s NDCG@10 (or MAP, MRR) is larger than the competing method’s NDCG@10 (or MAP, MRR).
\end{itemize}

In order to test the above hypothesis, we use a one-sided Wilcoxon signed-rank test to compare the paired metric values. Suppose we choose the significant level at 0.05. Table~\ref{tab:stat-test} indicates that PERE significantly outperforms RMV, DPE, PEO, and DRE across all performance metrics. PERE outperforms c-DPP in almost all metrics except for the NDCG@10 and MAP in the Gowalla dataset. However, this does not imply that c-DPP's NDCG@10 and MAP are higher than PERE in the Gowalla dataset.

\begin{table}[ht]
    \centering
    \small
    \caption{Statistical tests of 3 recommendation metrics across Amazon-Books and Gowalla datasets.}
    \begin{tabular}{lccccccc}
    \hline
        Metrics & Datasets & PERE vs. RMV &	PERE vs. DPE & PERE vs. PEO & PERE vs. DRE & PERE vs. c-DPP \\ \hline
        NDCG@10  &  Amazon-Books & $3e-12$ & $4e-11$ & $5e-3$ & $2e-26$ & $0.014$ \\ 
           & Gowalla & $1e-14$ &	$3e-14$ &	$2e-3$ &	$5e-10$ &	$0.198$ \\ \hline
        MAP  &  Amazon-Books & $2e-6$ & $4e-5$ & $1e-3$ & $1e-33$ & $0.026$ \\ 
           & Gowalla & $3e-12$ & $3e-9$ & $9e-5$ & $3e-7$ & $0.087$ \\ \hline
        MRR  &  Amazon-Books & $4e-11$ & $4e-11$ & $9e-3$ & $2e-31$ & $0.016$ \\ 
           & Gowalla & $2e-9$ & $4e-7$ & $7e-4$ & $6e-6$ & $0.039$ \\ \hline
    \end{tabular}
    \label{tab:stat-test}
\end{table}

\subsection{Burn-in Phase Comparison} \label{sec:app-burn-in}

% For the burn-in phase, we compare the DPP method against two popular baselines: RMV~\cite{ref:fonarev2016efficient} and Kmedoids~\cite{ref:liu2011wisdom}. The Greedy approach picks the most popular items, the Random method randomly selects $K$ items from the popular items. Additionally, we implement the $K$-medoids algorithm used in a previous study~\cite{ref:liu2011wisdom}, to identify representative items through cluster centroids. We make a slight modification to the $K$-medoids algorithm by considering only the items belonging to the popular items as potential centroids.
We use LightGCN / BiVAE for the burn-in phase to generate item embedding and conduct experiments on Gowalla and Amazon-Books datasets. We employ two widely recognized and straightforward baseline methods: RMV~\citep{ref:fonarev2016efficient} and $K$-Medoids~\citep{ref:liu2011wisdom}: RMV optimizes the volume of a rectangle matrix by selecting diverse yet orthogonal seed items in the embedding space. On the other hand, the $K$-Medoids algorithm, previously employed in a study~\citep{ref:liu2011wisdom}, identifies representative items through cluster centroids. We slightly modify the $K$-medoids algorithm by considering only the items belonging to the popular items as potential centroids. Note that sequential-based preference elicitation methods, such as DPE~\citep{ref:parapar2021diverse} or conditional DPP, are not applicable during the `burn-in' phase. In this phase, we aim to create a standardized questionnaire for all new users entering our system. Sequential-based methods, in contrast, involve asking new questions based on the responses of previous users.

Results for the burn-in phase are summarised in Table~\ref{tab:phase_1_full}. The results demonstrate that DPP (Determinantal Point Process) is the best approach for selecting initial items for the initial queries. DPP significantly outperforms baseline methods regarding performance metrics in all two datasets. The success of DPP can be attributed to its ability to effectively select a diverse set of items while considering the popularity score of each item. This combination allows DPP to balance diversity and relevance, resulting in superior performance compared to the baseline methods.

\begin{table*}[htb]
% \vspace{-7mm}
\centering
\small
\caption{Benchmark of performance metrics on Gowalla and Amazon-Books. Larger values are better. The best performance for any fixed number of questions is highlighted in bold. The number of items, in this case, is $K=50$ for all methods.}
\label{tab:phase_1_full}

\pgfplotstabletypeset[
    col sep=comma,
    string type,
    every head row/.style={before row=\toprule,after row=\midrule},
    % every row no 0/.style={after row=\midrule},
    every row no 3/.style={after row=\midrule},
    % every row no 11/.style={after row=\midrule},
    every last row/.style={after row=\bottomrule},
    columns/data/.style={column name=Method, column type={l}},
    columns/data/.style={column name=HR@1, column type={l}},
    columns/data/.style={column name=AUC@10, column type={l}},
    columns/data/.style={column name=NDCG@10, column type={l}},
    columns/data/.style={column name=NDCG@30, column type={l}},
    columns/data/.style={column name=MAP, column type={l}},
    columns/data/.style={column name=MRR, column type={l}},
]{tables/Phase1_full.csv}
\end{table*}

Moreover, to show the effectiveness of our proposed sequential elicitation in Section~\ref{sec:solution}, we conduct an additional experiment that compares PERE, which uses a static 50-item questionnaire in the beginning, and a series of 5 dynamic 5-item questionnaires after that, with a baseline where only a burn-in questionnaire using DPP is utilized to create a static 100-item questionnaire. 
Table~\ref{tab:ablation_1_results} illustrates that the combination of a 50-item questionnaire along with a series of 5 dynamic 5-item questionnaires outperforms the 100-item questionnaire, which highlights the effectiveness of our PERE method.

\begin{table}[ht]
    \centering
    \caption{Comparison between a burn-in questionnaire using DPP and PERE with $100$ elicited items for each method on Amazon-Books dataset.}

    \begin{tabular}{lccc}
        \toprule
         Datasets & Method &  NDCG@10 $\uparrow$ & MRR $\uparrow$ \\
         \hline
         Gowalla & Burn-in & 0.1497 & 0.1335 \\
         - & PERE & \textbf{0.1806} & \textbf{0.1518} \\
         \hline
         Amazon-Books & Burn-in & 0.3388 & 0.3152 \\ 
         - & PERE & \textbf{0.3616} & \textbf{0.3235} \\   
         \bottomrule
    \end{tabular}   \label{tab:ablation_1_results}
\end{table}

\subsection{Greedy and DPP Comparison}
While the greedy method chooses the most popular item, we employ the Determinantal Point Process (DPP) in the `burn-in' phase to achieve a better balance between diversity and popularity. DPP is advantageous in scenarios where preferences may diverge from mainstream popularity, ensuring a tailored and inclusive experience. Table~\ref{tab:greedy} demonstrates that our method is more effective than the greedy method in constructing a personalized questionnaire for new users with 100 elicited items.
\begin{table}[ht]
    \centering
    \caption{Comparison between PERE and Greedy method on Amazon-Books dataset.}
    \begin{tabular}{lccc}
    \hline
        Methods & NDCG@10 $\uparrow$ & MAP $\uparrow$ & MRR $\uparrow$ \\ \hline
        Greedy &  0.3415 & 0.198 & 0.3043  \\ 
        PERE  & \textbf{0.3616} & \textbf{0.2930} & \textbf{0.3235} \\ \hline
    \end{tabular}
    \label{tab:greedy}
\end{table}

% \subsection{Questionnaire Size Analysis}
% To be user-friendly, the questionnaire size should be small to avoid stressing the user's cognitive load. We find that the number of items at each round does not significantly affect the quality of the method. What is more interesting to track is the quality improvement over a long history as the \textit{total} number of questions increases. Therefore, we conduct an additional experiment to study the impact of the total number of questions on the performance metrics NDCG@$10$ and MRR. Figure~\ref{fig:vary-K} shows that our method outperforms RMV, DPE, PEO, and DRE in all datasets. Additionally, our method outperforms c-DPP when eliciting $K=100$ items in total.

% \begin{figure}[ht]
%     \centering
%     \includegraphics[width=0.8\linewidth]{figures/vary_K.png}
%     \caption{Performance improvements with the dynamic questionnaire size on Amazon-Books and Gowalla datasets.}
%     \label{fig:vary-K}
% \end{figure}

\section{Main Experiment Setting}
\subsection{Datasets Description}
In this paper, we use Gowalla~\citep{ref:cho2011friendship} dataset and Amazon-Books~\citep{ref:ni2019justifying} dataset. We report the statistics of Gowalla and Amazon-Books datasets in Table~\ref{tab:data-stat}. The description for each dataset is the following:
\begin{itemize}
    \item Gowalla is a location-based dataset that contains information about user check-ins at various locations. 
    \item Amazon-Books is a subset of the Amazon Product Review dataset, specifically centered on book products. This dataset comprises reviews and user ratings for various products.
\end{itemize}

\begin{table}[ht]
    \centering
    \caption{Characterisitics of datasets used in our experiments.}
    \begin{tabular}{lcccc}
    \hline
        Dataset & Train User \# & Item \#  & Interaction \#  & Density \\ \hline
        Gowalla & 28858 & 40981 & 1027370 & 0.00084 \\ 
        Amazon-Books  & 51643 & 91599 & 2984108 & 0.00062 \\ \hline
    \end{tabular}
    \label{tab:data-stat}
\end{table}

Amazon-Books includes both explicit and implicit user responses related to book products, whereas Gowalla exclusively provides implicit information indicating user preferences toward different locations. We employ two well-known methods to generate collaborative filtering embeddings for items: LightGCN and biVAE. LightGCN is trained solely to predict user-item interactions, making it suitable for datasets with implicit responses. On the other hand, biVAE is designed to predict specific ratings for user-item interactions, which necessitates explicit responses. Given that Gowalla contains only implicit responses, we exclusively use LightGCN on this dataset. However, since Amazon-Books contains explicit and implicit responses, we can utilize LightGCN and biVAE on this dataset. 

% \begin{table}[!ht]
%     \centering
%     \small
%     \begin{tabular}{|l|l|l|l|l|l|}
%     \hline
%         Dataset & Train User & Item  & Interaction  & Density \\ \hline
%         Gowalla & 28858 & 40981 & 1027370 & 0.00084 \\ \hline
%         Amazon-Books  & 51643 & 91599 & 2984108 & 0.00062 \\ \hline
%     \end{tabular}
%     \caption{Characterisitics of datasets used in our experiments.}
%     \label{tab:data-stat}
% \end{table}

\subsection{Baseline Description}
This paper uses a total of $7$ baselines, which can be divided into fixed questionnaire generation methods and sequential questionnaire generation methods.
Fixed questionnaire generation methods include:
\begin{itemize}
    \item RMV: Please refer to Section~\ref{sec:app-burn-in}.
    \item $K$-medoids: Please refer to Section~\ref{sec:app-burn-in}.
    \item DRE: initially, this method defines a categorical distribution for sampling seed items from the entire item pool. Subsequently, it simultaneously learns the categorical distributions and a neural reconstruction network to infer users' preferences based on collaborative filtering (CF) information from the sampled seed items. Then, the encoder is utilized to select the seed items, while the decoder is used to recommend the favorite items.
    \item DPP: Please refer to Section~\ref{sec:app-dpp}.
\end{itemize}
Sequential questionnaire generation methods include:
\begin{itemize}
    \item PEO: This method presents a novel elicitation approach to construct a static preference questionnaire. It formulates the task of generating preference questionnaires, encompassing relative questions for new users as an optimization problem that can be solved in linear time of the number of items.
    \item Conditional DPP: Conditional DPP is a modified version of DPP that selects $K$ items from the remaining set of items.
    \item DPE: This preference elicitation model employs multi-armed bandits to diversify the seed item set through topic and item diversity.
\end{itemize}
\subsection{Implementation Details} 
We use the standard codebase of LightGCN\footnote{\url{https://github.com/gusye1234/LightGCN-PyTorch}} and cornac implementation of biVAE\footnote{\url{https://github.com/recommenders-team/recommenders/tree/main}} to generate item embedding and new user embedding. Afterward, we generate a new user according to Section 4 and use it as ground truth in our evaluation. This characteristics generation is necessary because we want to model experience probability that allows users to skip a question ($\NA$ response) in our questionnaire.  

\section{Inconsistency in Elicitation} \label{sec:app-inconsistency}

In this section, we further introduce a method that can tweak the Chebyshev center to account for the inconsistent elicitation. Let $|\mbb P|$ denote the cardinality of the set $\mbb P$.  Suppose we tolerate $\tau$\% of inconsistency, i.e., at most $\tau | \mbb P|$ preferences in the set $\mbb P$ that can be violated. We define $\mc U_{\mbb P}^\tau$ as the set of vectors $u_c$ with at most $\tau\%$ inconsistency with the preference set $\mbb P$. This set can be represented using auxiliary binary variables as 
\[
\mc U_{\mbb P}^\tau = 
\left\{
u \in \mbb H: 
\begin{array}{l}
\exists \gamma_{ij} \in \{0, 1\}~~\forall v_i \succsim v_j \in \mbb P \\
\sum_{(i,j) \in \mbb P} \gamma_{ij} \le \tau | \mbb P| \\
2 u_{c}^\top (v_j - v_i) \le \|v_j\|_2^2 - \|v_i\|_2^2 + \gamma_{ij} \mathds{M}
\end{array}
\right\},
\]
where $\mathds{M}$ is a big-M constant. Intuitively, $\gamma_{ij}$ is an indicator variable: $\gamma_{ij}=1$ implies that the preference is inconsistent.

\begin{theorem}[Chebyshev center with inconsistent elicitation] \label{thm:chebyshev2}
Given a tolerance parameter $\tau \in (0, 1)$. The Chebyshev center $u_c\opt$ of the set $\mc U_{\mbb P}$ can be found by solving the following problem
\[
    \begin{array}{cl}
         \max & r \\
         \st & 2 u_{c}^\top (v_j - v_i) + 2 r \| v_j - v_i \|_2 \le \|v_j\|_2^2 - \|v_i\|_2^2 + \gamma_{ij} \mathds{M} ~\forall v_i \succsim v_j \in \mbb P \\
         & \sum_{(i,j) \in \mbb P} \gamma_{ij} \le \tau | \mbb P | \\
             & u_c \in \mbb H,~r \in \R_+,~\gamma_{ij} \in \{0, 1\}~\forall v_i \succsim v_j \in \mbb P ,
    \end{array}
\]
where $\mathds{M}$ is a big-M constant.
\end{theorem}

\begin{proof}
    The optimization problem to find the Chebyshev center and its radius can be rewritten as
    \[
        \begin{array}{cl}
        \max & r \\
         \st & 2 (u_{c} + \delta)^\top (v_j - v_i) \le  \|v_j\|_2^2 - \|v_i\|_2^2 + \gamma_{ij} \mathds{M}~\forall \delta \in \mc B_r,~\forall v_i \succsim v_j \in \mbb P \\
         & \sum_{(i,j) \in \mbb P} \gamma_{ij} \le \tau | \mbb P | \\
            & u_c \in \mbb H,~r \in \R_+,~\gamma_{ij} \in \{0, 1\}~\forall v_i \succsim v_j \in \mbb P ,
        \end{array}
    \]
    where $\mc B_r = \{\delta \in \R^d: \| \delta \|_2 \le r\}$ is a $d$-dimensional Euclidean ball of radius $r$. Pick any preference $v_i \succsim v_j \in \mbb P$, the semi-infinite constraint 
    \[
        2 (u_{c} + \delta)^\top (v_j - v_i) \le \|v_j\|_2^2 - \|v_i\|_2^2 + \gamma_{ij} \mathds{M} ~\forall \delta \in \mc B_r
    \]
    is equivalent to the robust constraint
    \[
    2 u_{c}^\top (v_j - v_i) + 2 \sup_{\|\delta\|_2 \le r} \delta^\top (v_j - v_i) \le \|v_j\|_2^2 - \|v_i\|_2^2 + \gamma_{ij} \mathds{M}.
    \]
    Because the Euclidean norm is a self-dual norm, we have
    \[
        \sup_{\|\delta\|_2 \le r} \delta^\top (v_j - v_i) = r \|v_j - v_i\|_2.
    \]
    Substituting the above relationship to the optimization problem completes the proof.
\end{proof}

\end{document}
