\documentclass{article}

\usepackage{aistats2024_author_response}

\usepackage[utf8]{inputenc} % allow utf-8 input
\usepackage[T1]{fontenc}    % use 8-bit T1 fonts
\usepackage{hyperref}       % hyperlinks
\usepackage{url}            % simple URL typesetting
\usepackage{booktabs}       % professional-quality tables
\usepackage{amsfonts}       % blackboard math symbols
\usepackage{nicefrac}       % compact symbols for 1/2, etc.
\usepackage{microtype}      % microtypography
\usepackage[svgnames]{xcolor}         % define colors in text
\usepackage{xspace}         % fix spacing around commands
\usepackage{wrapfig}
\usepackage{graphicx}
\usepackage{bm}
\newcommand{\BQ}{\textsc{BanditQ} }

\begin{document}
%
\iffalse
%You have until \textbf{Tuesday, December 5, 2023 (11:59PM Anywhere on Earth)} to (optionally) respond to the reviews. You must submit a single response that addresses all reviews (not one response per review). The author response is limited to a \textbf{single page} in PDF format, including all figures, tables, and references, and has to use the AISTATS ``author response'' style that accompanies this \texttt{tex}-file. You may not alter this style file; in particular, you may not change the paper size, font, font size, or margins. Moreover, author responses must not contain external links, and must be \textbf{anonymized}.
%
%Please focus your response on either answering specific questions raised in the reviews or correcting any misunderstanding or factual errors in the reviews.
%
%You can change your response as often as you like until the above deadline. Please note that \textbf{this deadline is strict} and we encourage you to submit your response early so as to avoid technical issues. Please be aware that the deadline is \textbf{11:59PM Anywhere on Earth}.
%
%
%To include a figure in your response, the following LaTeX code is a possible solution:
%
%\begin{verbatim}
%\begin{minipage}[b]{0.3\linewidth}
%\includegraphics[width=\linewidth]{path_to_figure}
%\captionof{figure}{figure_caption}
%\end{minipage}
%\end{verbatim}
%
%For submissions without a reproducibility checklist, a separate document containing the checklist only is allowed. Please refer to the AISTATS 2024 submission template for the reproducibility checklist. In such case please upload both the author response and the reproducibility checklist either in the same document or in a zip file containing 2 pdf files.
\textbf{\textcolor{blue}{\underline{Reviewer \#2 (Q5), \#4 (Q5.2)}}} 
\textbf{On the choice $\nu=1/{2N}:$} 
The parameter $\nu$ denotes a lower bound to the minimum fraction of time each arm must be pulled. For $N$ arms, irrespective of the value of $\alpha$, the maximum value of $\nu$ is $1/{N}$. Setting $\nu=1/{N}$ is equivalent to pulling all arms uniformly at random, which would incur linear regret. To strike a balance between fairness and performance, in our experiments, we had set $\nu=1/{2N}$. The \texttt{FairCB} algorithm is free to pull any particular arm more than the specified lower bound. Figure 1 below reports experimental results for the choice $\alpha=0.9,\nu=0.9/{N}$ and $\delta=0.001$, which shows that the proposed policy outperforms the other two baselines.\\
%\begin{wrapfigure}{R}{0.1\textwidth}
%\centering
%\includegraphics[width=0.2\textwidth]{../../AISTATS24-paper/plots/fairness_variation.pdf}
%%\caption{\label{fig:frog1}This is a figure caption.}
%\end{wrapfigure}
%\textbf{Lower bound to the Approximation ratio:} The lower bound for the approximation ratio in the bandit setting follows from the corresponding lower bound established for the full-information setting. Unlike the classic bandit problems where a sublinear regret is achievable, due to the non-separability of the alpha-fair utility function, an undiscounted sublinear regret is not attainable for this problem. \\
\textcolor{blue}{\bf \underline{Reviewer \#4 (Q3):}} \textbf{Why our result is surprising:} Most of the known regret bounds for contextual bandits assume that the cumulative reward can be decomposed as the sum of per-round rewards [Agarwal et al., ICML 2014]. However, in this paper, we consider the classic $\alpha$-fair utility function, where, due to the non-linearity, the total reward \emph{cannot} be decomposed as the sum of rewards in each round. While these types of concave utility functions are essential for inducing fairness, their effect on designing regret-minimizing algorithms is substantial. For example, it is known that it is impossible to guarantee sublinear regret for the $\alpha$-fair utility function [Sinha et al., 2023, Theorem 2]. Furthermore, the classical regret analysis also breaks down due to the non-linearity. A major contribution of our paper is the surprising finding that one can still guarantee an approximate sublinear regret for this problem for a small approximation factor.\\
\textcolor{blue}{\bf \textcolor{blue}{\underline{Reviewer \#4 (Q5.1, Q11.2)}}} \textbf{On the minimum rewards:} The assumption of a strict positive lower bound to the minimum rewards is made only for technical convenience. By appropriately shifting and scaling, the range of any bounded reward can be made strictly positive. Our proposed policy can then be run on the transformed rewards. In particular, assume that the original rewards are bounded in the range $[a,b]$. Then, for some $\epsilon>0,$ we can consider a modified reward $r’= (r+\epsilon-a)/(b+\epsilon-a)$, which ensures that $r’ \in [\epsilon/(b+\epsilon-a), 1],$ irrespective of $a$ and $b$. Thus, the transformed rewards are strictly positive and lower bounded by $\delta \equiv \epsilon/(b+\epsilon-a)$. In particular, it can be easily verified that the choice $\epsilon=O(1/T^2)$ does not affect the regret bound. Figure 2 below compares the performance of different policies when we set $\delta=0.001, \alpha =0.9, \nu=0.9/N$. It follows that the qualitative picture does not change.\\ 
%\textcolor{red}{Figure?} compares the performance of different policies when the minimum reward $\delta$ is set to $0.001.$ It can be seen that the qualitative picture does not change significantly.\\
%\textcolor{blue}{\bf \underline{Reviewer \#4, Q5 (2):}} Please note that $\nu$ denotes a lower bound to the fraction of time 
\textcolor{blue}{\bf \underline{Reviewer \#4 (Q5.3)}}: 
In Figures 10 and 11 in the Appendix, we reported the performance of the proposed policy for different values of $\alpha.$ \\
%Figure 3 below compares the performance of \texttt{FairCB} with our proposed policy.\\
\textcolor{blue}{\bf \underline{Reviewer \#4 (Q9)}} \textbf{Clarity:} $\Delta_N$ denotes the standard $N-1$-dimensional probability simplex. The word "user" should be replaced with "arm". 
In the last paragraph on page 3, the derivatives correspond to surrogate rewards for the linearized problem. These are not new definitions and are derived from the linearization of the original non-linear utility.\\
\textcolor{blue}{\bf \underline{Reviewer \#4 (Q9.2)}} \textbf{The $\alpha$-fairness metric:}
The $\alpha$-fair fairness metric has been widely adopted in the literature (see Lan et al., INFOCOM 2010, Si Salem et al., POMACS 2022, and the references therein). The popularity of the $\alpha$-fair metric partly stems from the fact that it comes out naturally as the definition of fairness from an axiomatic point of view. It also yields popular proportional fair and min-max fairness metrics as special cases (Lan et al., 2010).\\
\textcolor{blue}{\bf \underline{Reviewer \#4 (Q7), \#6 (Q7, Q11.1)}} \textbf{On the relationship with Sinha et al. (2023) and Putta and Agarwal (2022): }
Sinha et al. (2023) consider the non-contextual full-information $\alpha$-fair utility maximization problem, whereas Putta and Agarwal (2022) propose scale-free bandit algorithms for the classic adversarial MAB problem. Our paper builds upon these two apparently unrelated recent works by effectively combining their tools to address the $\alpha$-fair contextual bandit problem for the first time in the literature. This is a non-trivial exercise as the cumulative reward vector $\bm{R}(t)$ is affected by all $M$ different adversarially chosen contexts. For this, we augment the analysis of Sinha et al. (2023) using a novel bootstrapping method to account for multiple contexts using the scale-free regret bound of Putta and Agarwal (2022).\\
\textcolor{blue}{\bf \underline{Reviewer \#6 (Q11.2):}} \textbf{Assumption of finitely many contexts:} The assumption of a finite number of contexts is common in the contextual bandit literature [Balseiro et al.\ NeurIPS (2019), Chen et al.\ UAI (2020)]. Furthermore, to the best of our knowledge, ours is the first paper that considers the contextual bandit problem with the non-linear $\alpha$-fair utility function. To keep the technicalities at a minimum, we considered the simplest case with finitely many contexts. Since our proposed algorithm runs a no-regret policy for each context on each round (common for unstructured context spaces), to keep the computational burden manageable, we insist on a small number of contexts (which can be ensured, e.g., by clustering similar contexts together using a standard $\epsilon$-net argument). Extending our results to a structured context space with infinitely many contexts would be an interesting future research direction.
% \begin{figure}[ht!]
% 	\includegraphics[scale=0.3]{plots/jains_index_alpha=0.9_smallreward=0.001_mu=0.9⁄N.pdf}
% \end{figure}

\begin{figure}[ht!]
\vspace{-12pt}
\centering
\includegraphics[scale=0.53]{plots/jains_index_full_information.pdf}
\includegraphics[scale=0.53]{plots/approximate_regret_full_information.pdf}
%\includegraphics[scale=0.32]{plots/fairness_variation.pdf}
% \includegraphics[scale=0.1, width=.3\textwidth]{plots/approximate_regret_smallreward=0.001_alpha=0.9.pdf}\hfill
% \includegraphics[scale=0.1, width=.3\textwidth]{plots/jains_index_alpha=0.9_smallreward=0.001_mu=0.9⁄N.pdf}

%\caption{\scriptsize{default}}
\label{fig:figure3}

\end{figure}
\fi
\begin{figure*}[t]
  \centering
  \begin{minipage}[b]{0.45\linewidth}
   \centering
    \includegraphics[width=\linewidth]{./Figures/Reward_rates_full_info.pdf}
   \caption{\small{Reward accrual rates in the full-information setting}}
   \label{rew_full}
  \end{minipage}
   \begin{minipage}[b]{0.45\linewidth}
   \centering
    \includegraphics[width=\linewidth]{./Figures/Q_lengths_full_info.pdf}
   \caption{\small{Queue lengths in the full-information setting}}
   \label{q_full}
  \end{minipage}
   \begin{minipage}[b]{0.45\linewidth}
   \centering
    \includegraphics[width=\linewidth]{./Figures/Regret_full_info.pdf}
   \caption{\small{Regret of \BQ in the full-information setting}}
   \label{reg_full}
  \end{minipage}
  \begin{minipage}[b]{0.45\linewidth}
   \centering
    \includegraphics[width=\linewidth]{./Figures/Reward_rates_bandit_feedback.pdf}
   \caption{\small{Reward accrual rates in the bandit feedback}}
   \label{rew_bf}
  \end{minipage}
  %\hfill
  \begin{minipage}[b]{0.45\linewidth}
   \centering
    \includegraphics[width=\linewidth]{./Figures/Q_lengths_Bandit_feedback.pdf}
   \caption{\small{Queue lengths in the bandit feedback setting}}
   \label{q_bf}
  \end{minipage}
 %\hfill
   \begin{minipage}[b]{0.45\linewidth}
   \centering
    \includegraphics[width=\linewidth]{./Figures/Regret_bandit_feedback.pdf}
   \caption{\small{Regret of \BQ in the bandit feedback setting}}
   \label{reg_bf}
  \end{minipage}
  \caption{\Large{Performance of the BanditQ policy with $N=1000$ arms in both full information and bandit feedback setting}}
\end{figure*}



%
%Reference:
%
%1.	T. Lan, D. Kao, M. Chiang and A. Sabharwal, "An Axiomatic Theory of Fairness in Network Resource Allocation," 2010 Proceedings IEEE INFOCOM, San Diego, CA, USA, 2010, pp. 1-9, doi: 10.1109/INFCOM.2010.5461911.
%2.	Balseiro, S., Golrezaei, N., Mahdian, M., Mirrokni, V. and Schneider, J., 2019. Contextual bandits with cross-learning. Advances in Neural Information Processing Systems, 32.
%3.	Chen, Yifang, Alex Cuellar, Haipeng Luo, Jignesh Modi, Heramb Nemlekar, and Stefanos Nikolaidis. "Fair contextual multi-armed bandits: Theory and experiments." In Conference on Uncertainty in Artificial Intelligence, pp. 181-190. PMLR, 2020.
%4.	Si Salem, Tareq, Georgios Iosifidis, and Giovanni Neglia. "Enabling long-term fairness in dynamic resource allocation." Proceedings of the ACM on Measurement and Analysis of Computing Systems 6, no. 3 (2022): 1-36.
%5.	Agarwal, Alekh, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert Schapire. "Taming the monster: A fast and simple algorithm for contextual bandits." In International Conference on Machine Learning, pp. 1638-1646. PMLR, 2014.
%
%
%
%
%


\end{document}
