\section{Introduction}

\paragraph{Sequential balanced allocations.} In the sequential balanced allocations framework, there are $m$ tasks (balls) to be allocated into $n$ servers (bins). It is well-known that allocating the balls into bins sampled uniformly at random (a.k.a.~\OneChoice) leads \Whp\footnote{In general, with high probability refers to probability of at least $1 - n^{-c}$ for some constant $c > 0$.}~to a maximum load of $\Theta(\log n/\log \log n)$ for $m = n$ and a gap (maximum load minus average load) of $\Theta\big(\sqrt{(m/n) \cdot \log n}\big)$ for $m \geq n \log n$. 

An improvement over \OneChoice is the \DChoice process~\cite{KLM96,ABKU99,BCSV06}, where for each ball $d$ bins are sampled uniformly at random and the ball is allocated to the least loaded of the sampled bins. For any $m \geq n$, this process achieves \Whp~an $\log_d \log n + \Theta(1)$ gap, i.e., a gap that does not depend on $m$. For $d = 2$, this great improvement is known as ``power-of-two-choices'' (see also surveys~\cite{MRS01,W17} for more details). Despite the simplistic nature of the balanced allocation framework, the \TwoChoice process has had a significant impact on practical applications such as load balancing and distributed storage systems, which was also acknowledged by the ``\emph{ACM Paris Kanellakis Theory and Practice Award 2020}''~\cite{award20}.

Several variants of \TwoChoice have been studied. Of particular importance to this work is the \OnePlusBeta process, where for each ball we run \TwoChoice with probability $\beta \in (0, 1]$ and \OneChoice otherwise. Mitzenmacher~\cite{M96} introduced this process as a model of~\TwoChoice with erroneous comparisons. Peres, Talwar and Wieder~\cite{PTW15} showed that for $\beta := \beta(n) \ll 1$, it achieves \Whp~a $\Theta((\log n)/\beta)$ gap (see also~\cite{LS22Batched}), which becomes worse for smaller $\beta$, but still remains independent of $m$. An additional reason of the significance of \OnePlusBeta is the application to the analysis of \TwoChoice in the popular \emph{graphical setting}~ \cite{KP06,BF21,PTW15}, where bins are organized as vertices in a graph, and each ball samples an edge uniformly at random.

Another variant of \TwoChoice that has received some attention recently is the family of \TwoThinning processes~\cite{FG18,FL20}, where the ball is allocated to the second sample only if the first one does not meet a certain criterion, e.g., based on a threshold or quantile. 

It should be noted that the analyses of all these processes strongly rely on the fact that the load information of each bin is updated after each allocation. In effect that means balls can only be allocated sequentially, which is a downside in distributed and parallel environments.

\paragraph{Outdated information settings.} In this work, we demonstrate that in outdated information settings by choosing an appropriately small $\beta$, \OnePlusBeta achieves the asymptotically optimal gap among a large class of processes, including not only \TwoChoice (and \OneChoice), but even adaptive processes that may allocate with a different scheme after each batch. It has been long observed that the performance of the \TwoChoice process deteriorates under outdated information and delays~\cite{W86,M00,D00,OWZS13,FGCBG97}. %
Berenbrink, Czumaj, Englert, Friedetzky and Nagel~\cite{BCEFN12} studied the \Batched setting where balls are allocated in batches of size $b$. That means, every batch of $b$ consecutive balls can be allocated in parallel, as the decision where to place the ball only depends on the load configuration after that batch of balls arrived. For $b = n$, they proved that \TwoChoice achieves \Whp~an $\Oh(\log n)$ gap. This bound was recently improved to $\Theta(\log n/\log \log n)$ in~\cite{LS22Noise}, and in the same work, it was shown that \TwoChoice has a gap that matches the maximum load of \OneChoice for $b$ balls, for any batch size $b \in [n \cdot e^{-\log^{\Theta(1)} n}, n \log n]$, and so it is asymptotically optimal. In contrast, for $b \geq n \log n$, \TwoChoice (and a family of other processes) have \Whp~a $\Theta(b/n)$ gap~\cite{LS22Batched}, a bound which was shown to hold even in the presence of weights and on graphs. This analysis also demonstrates that increasing $d$ in the \DChoice process, does not always improve the gap, which is in sharp contrast to the sequential setting.

Outdated information settings have been also studied in the queuing setting~\cite{W86,AN92,KK95,FGCBG97,M00}. In particular, Mitzenmacher~\cite{M00} studied an equivalent version of the \Batched setting, called the \textit{bulletin board model with periodic updates}, showing that some processes requiring centralized coordination can outperform \TwoChoice, but no explicit rigorous bounds were proven. This shortcoming of \TwoChoice was characterized as \textit{herd behavior}, meaning that some of the initially lighter bins receive disproportionately many balls, turning them into heavy bins. In another empirical study, Dahlin~\cite{D00} also empirically observed the herd behavior and suggested similar centralized strategies to improve upon \DChoice.

\paragraph{Weighted settings.} Several works study balanced allocation processes with weights~\cite{TW07,BFHM08,PTW15,LS22Batched}. We will be focusing on weights sampled independently from probability distributions with bounded moment generating functions as in~\cite{PTW15} and~\cite{LS22Batched}.

\paragraph{Our Results.} In this work we show that a family of processes satisfying a technical condition achieve the asymptotically optimal gap\footnote{By \textit{optimal} we mean over all processes that choose a probability vector $p$ at the beginning of the batch and this stays the same throughout the batch (see \cref{thm:lower}).} of $\Oh\big(\sqrt{(b/n) \cdot \log n} \big)$ in the \Weighted \Batched setting for $b \in [2n \log n, n^3]$,\footnote{For $b = \Theta(n \log n)$, the $\Oh(\log n)$ bound also follows from~\cite[Theorem 4.2]{LS22Batched}} leading to roughly a quadratic improvement over the gap of the \TwoChoice process. This family of processes includes the \OnePlusBeta process, which is a process that can be implemented in a decentralized manner, and demonstrates that by setting $\beta = \sqrt{(n \log n)/b}$ we attain this asymptotically optimal gap. %

We also provide lower bounds establishing the tightness of our upper bounds. Interestingly, the lower bound of $\Omega(\sqrt{(n \log n)/b} )$ even applies to a much more powerful class of allocation processes, where the allocation rule can be tailored at the beginning of each batch to the current load configuration.

The intuition for these optimal processes relates to the herd behavior observed in~\cite{M00}; similarly, \cite{D00} calls this a ``too aggressive'' load interpretation. For the \DChoice process, the maximum probability of allocating to a bin is $\max_{i \in [n]} p_i \approx d/n$. This means that, for example, in \TwoChoice in a batch of $b$ balls there are some bins that receive $\approx 2b/n$ balls and so a gap of $\approx b/n$ arises. This becomes worse as $d$ grows. Here the processes we consider have $\max_{i \in [n]} p_i = (1 + o(1))/n$, which means that no bin receives too many balls in any particular batch. For example, the \OnePlusBeta process has $\max_{i \in [n]} p_i \approx (1 + \beta)/n$, which means that this mixing of \OneChoice steps with \TwoChoice steps circumvents the herd behavior (\cref{fig:two_choie_vs_batch_visual}). The different gap bounds and performances among \OneChoice, \TwoChoice and \OnePlusBeta depending on the batched setting are summarized in \cref{fig:table}.

\begin{figure}
    \centering
\begin{minipage}[t]{0.3\textwidth}
\begin{center}
\includegraphics[scale=0.17]{figs/tc_load_vector.pdf} \\
    \TwoChoice
\end{center}
\end{minipage}
\begin{minipage}[t]{0.3\textwidth}
\begin{center}
\includegraphics[scale=0.17]{figs/opb_load_vector.pdf} \\
    \OnePlusBeta process
\end{center}
\end{minipage}
    \caption{The balls allocated in the latest batch shown in red for \TwoChoice (left) and \OnePlusBeta with $\beta = 1/2$ (right). Observe that \TwoChoice allocates more aggressively on the bins that are lightly loaded at the beginning of the batch, while \OnePlusBeta spreads the allocations more evenly.}
    \label{fig:two_choie_vs_batch_visual}
\end{figure}

\begin{table*}
\centering
\resizebox{\textwidth}{!}{
\renewcommand{\arraystretch}{1.75}
 \begin{tabular}{ccccc}
        \textbf{Process} & \textbf{Gap in Sequential Setting } & \textbf{Gap in Batched Setting} & \textbf{Constraints for Batching}    \\ \hline
        \OneChoice & $\Theta \left(\sqrt{ (m/n) \cdot \log n} \right)$~\cite{RS98} &  $\Theta \left(\sqrt{ (m/n) \cdot \log n} \right)$~\cite{RS98} & none \\ \hline 
        \multirow{2}{*}{\TwoChoice} & \multirow{2}{*}{$\log_2 \log n + \Oh(1)$~\cite{ABKU99,BCSV06} } & $\Omega\left( b/n \right)$~\cite{LS22Batched} & none     \\
        &  & $\Oh\left( b/n \right)$~\cite{LS22Batched} & $b \geq n \log n$    \\ \hline
           \multirow{4}{*}{$(1+\beta)$} & \multirow{4}{*}{$\Theta(  (1/\beta) \cdot \log n  )$ ~\cite{PTW15}}  & $\Omega\left( \sqrt{ (b/n) \cdot \log n} \right)$~\cref{thm:lower} & \multirow{2}{*}{$b \in [2n \log n,n^3]$, $\beta = \sqrt{(n \log n)/b}$} \\
           & & $\Oh\left( \sqrt{ (b/n) \cdot \log n} \right)$~\cref{thm:batching_strong_gap_bound} & \\
             &    & $\Omega \left( \sqrt{ (b/n ) \cdot \log n} \right)$~\cref{thm:lower} & \multirow{2}{*}{$b \geq n^3$, $\beta = \sqrt{n/b}$} \\
             & & $\Oh\left( \sqrt{ b/n } \cdot \log n \right)$~\cref{cor:weak_bound} & 
             \\ \hline
        \end{tabular}
}
~\vspace{1em}~
\caption{For the sequential setting (i.e., the setting without batching), all gap bounds hold for any $m \geq n \cdot \log n$. For the sake of simplicity, we focus on the unweighted setting and only list results in batching for the \OnePlusBeta process, and where $b \geq n \log n$. Among all those processes, \OneChoice produces the worst gap in both settings, even though the gap does not change between the batched and sequential setting. For \TwoChoice, the gap becomes $b/n$ in the batched setting, whereas for $(1+\beta)$ the gap is only $\sqrt{(b/n) \cdot \log n}$ (for a suitable $\beta$), which leads to a better performance than \TwoChoice.}
\label{fig:table}
\end{table*}

\paragraph{Our Techniques.} Our techniques build on and refine those in~\cite{LS22Batched}, making use of the hyperbolic cosine potential function~\cite{PTW15} and variants. More specifically, a slightly weaker version of our tight upper bound is based on \cite[Theorem 3.1]{LS22Batched} and a refinement of \cite[Lemma 4.1]{LS22Batched}.  For our tight gap bound, our approach uses an interplay between two hyperbolic cosine potential functions to prove concentration and then an exponential potential with a larger smoothing parameter to deduce the refined gap.
A similar method was used in \cite[Section 5]{LS22Batched}, but one crucial novelty here is that we consider allocation processes whose probability allocation vector have a small $\ell_{\infty}$ distance from the uniform distribution. We believe that relating and comparing different allocation processes based on their $\ell_{\infty}$ distance (or other metrics) could be a promising idea for future work. This can be also seen as a natural relaxation of the \emph{majorization technique}, which has been the dominant tool to relate different allocation processes \cite{PTW15,LS22Queries}.

\paragraph{Organization.} In \cref{sec:notation}, we introduce the balanced allocation framework and its notation, and define the processes and settings that we will be working with. In particular, in \cref{sec:conditions} we define general conditions on the probability allocation vector used by the processes, under which our upper bounds on the gap apply. In \cref{sec:weak_gap}, we prove the $\Oh\big(\sqrt{b/n} \cdot \log n\big)$ bound on the gap for a family of processes in the \Weighted \Batched setting. In \cref{sec:strong_gap}, we perform a refined analysis and improve this gap bound to $\Oh\big(\sqrt{(b/n) \cdot \log n}\big)$.  
In \cref{sec:lower_bounds}, we show that this achieved gap is asymptotically optimal, and in~\cref{sec:experiments}, we present some empirical results on the gap of some specific processes.
Finally, in \cref{sec:conclusions}, we summarize the results and conclude with some open problems.












\section{Notation, Processes and Settings} \label{sec:notation}

In this section, we introduce notation, processes and settings used throughout this work. %

\subsection{Basic Notation} \label{sec:basic_notation}

We consider the allocation of $m$ balls into $n$ bins, which are labeled $[n]:=\{1,2,\ldots,n\}$. For the moment, the $m$ balls are unweighted (or equivalently, all balls have weight $1$). For any step $t \geq 0$, $x^{t}$ is the $n$-dimensional \emph{load vector}, where $x_i^{t}$ is the number of balls allocated into bin $i$ in the first $t$ allocations. In particular, $x_i^{0}=0$ for every $i \in [n]$. Finally, the \emph{gap} is defined as
\[
 \Gap(t) = \max_{i \in [n]} x_i^{t} - \frac{t}{n}.
\]
It will be also convenient to keep the load vector $x$ sorted. To this end, let $\tilde{x}^t:=x^t-\frac{t}{n}$. Then, relabel the bins such that $y^{t}$ is a permutation of $\tilde{x}^t$ and $y_1^{t} \geq y_2^{t} \geq \cdots \geq y_n^{t}$. Note that $\sum_{i \in [n]} y_i^t=0$ and $\Gap(t)=y_1^t$. We will call a bin $i \in [n]$ \emph{overloaded}, if $y_i^t \geq 0$ and \emph{underloaded} otherwise. 

Following~\cite{PTW15}, many allocation processes can be described by a time-invariant \emph{probability allocation vector} $p_i$, $1 \leq i \leq n$, such that at each step $t \geq 0$, $p_i$ is the probability of allocating a ball into the $i$-th most heavily loaded bin (or equivalently, incrementing $y_i^t$ by one).

By $\mathfrak{F}^t$ we denote the filtration of the process until step $t$, which in particular reveals the load vector $x^t$.

\subsection{Processes} \label{sec:processes}


We start with a formal description of the \OneChoice process.

\begin{samepage}
\begin{framed}
\vspace{-.45em} \noindent
\underline{\OneChoice Process:} \\
\textsf{Iteration:} For each $t \geq 0$, sample one bin $i$, independently and uniformly at random. Then update:  
    \begin{equation*}
     x_{i}^{t+1} = x_{i}^{t} + 1.
 \end{equation*}\vspace{-1.5em}
\end{framed}
\end{samepage}
\noindent We continue with a formal description of the \TwoChoice process.
\begin{samepage}
\begin{framed}
\vspace{-.45em} \noindent
\underline{\TwoChoice Process:} \\
\textsf{Iteration:} For each $t \geq 0$, sample two bins $i_1$ and $i_2$, independently and uniformly at random. Let $i \in \big\{i_1, i_2 \big\}$ be such that $x_{i}^{t} = \min\{ x_{i_1}^t,x_{i_2}^t\}$, breaking ties randomly. Then update:  
    \begin{equation*}
     x_{i}^{t+1} = x_{i}^{t} + 1.
 \end{equation*}\vspace{-1.5em}
\end{framed}
\end{samepage}
It is immediate that the probability vector of \TwoChoice is
\begin{equation*}
    p_{i} = \frac{2i-1}{n^2}, \qquad \mbox{ for all $i \in [n]$.}
\end{equation*}

Following~\cite{PTW15}, we recall the definition of $(1+\beta)$ which is a process interpolating between \OneChoice and \TwoChoice:
\begin{samepage}
\begin{framed}
\vspace{-.45em} \noindent
\underline{($1+\beta$) Process:}\\
\textsf{Parameter:} A mixing factor $\beta \in (0,1]$.\\
\textsf{Iteration:} For each $t \geq 0$, sample two bins $i_1$ and $i_2$, independently and uniformly at random. Let $i \in \{ i_1, i_2 \}$ be such that $x_{i}^{t} = \min\big\{ x_{i_1}^t,x_{i_2}^t \big\}$, breaking ties randomly. Then update:  
    \begin{equation*}
    \begin{cases}
     x_{i}^{t+1} = x_{i}^{t} + 1 & \mbox{with probability $\beta$}, \\
      x_{i_1}^{t+1} = x_{i_1}^{t} + 1 & \mbox{otherwise}.
   \end{cases}
 \end{equation*}\vspace{-1.em}
\end{framed}
\end{samepage}

In other words at each step, the $(1+\beta)$ process allocates the ball following the \TwoChoice rule with probability $\beta$, and otherwise allocates the ball following the \OneChoice rule. Therefore, the probability vector is given by~\cite{PTW15}:
\begin{equation*}
    p_{i} =
    (1-\beta) \cdot \frac{1}{n} + \beta \cdot \frac{2i-1}{n^2}, \qquad \mbox{ for all $i \in [n]$.}
\end{equation*}
Recall that in \cite{PTW15}, it was shown that $\Gap(m) = \Oh(\frac{\log n}{\beta})$ for any $m \geq n$ and $\beta \in (0, 1]$; so in particular, this gap (bound) does not grow with $m$.

The next process is another relaxation of \TwoChoice.
\begin{samepage}
\begin{framed}
\vspace{-.45em} \noindent
\underline{$\Quantile(\delta)$ Process:}\\
\textsf{Parameter:} A quantile $\delta \in \{1/n, 2/n, \ldots, 1 \}$.\\
\textsf{Iteration:} For each $t \geq 0$, sample two bins $i_1$ and $i_2$, independently and uniformly at random, and update:  
    \begin{equation*}
    \begin{cases}
     x_{i_2}^{t+1} = x_{i_2}^{t} + 1 & \mbox{if $i_1$ is among the $\delta \cdot n$ most loaded bins}, \\
     x_{i_1}^{t+1} = x_{i_1}^{t} + 1 & \mbox{otherwise}.
   \end{cases}
 \end{equation*}\vspace{-1.em}
\end{framed}
\end{samepage}
Note that the $\Quantile(\delta)$ processes can be implemented as a two-phase procedure: First probe the bin $i_1$ and place the ball there if $i_1$ is not among the $\delta \cdot n$ heaviest bins. Otherwise, take a second sample $i_2$ and place the ball there. Since we only need to know whether a bin's rank is above or below a value, the response by a bin can be encoded as a single bit (at the cost of knowing the rank of each bin). The probability vector of $\Quantile(\delta)$ is given by:
\begin{equation*}
    p_{i} =
    \begin{cases}
     \frac{\delta}{n} & \mbox{ if $1 \leq i \leq \delta \cdot n$}, \\
     \frac{1+\delta}{n} & \mbox{ if $\delta \cdot n < i \leq n$}.
    \end{cases}
\end{equation*}
Another, equivalent description of $\Quantile(\delta)$ is that we perform \TwoChoice, but only get to know whether a sampled bin's rank is below or above $\delta \cdot n$ and break ties randomly.





\subsection{Conditions on Probability Vectors}\label{sec:conditions}

In \cite{LS22Batched}, the \Weighted \Batched  setting was analyzed for probability allocation vectors satisfying the following two conditions. The first condition says that the process has a small $\eps/n$ bias to place away from overloaded and towards underloaded bins; and the second condition says that no bin has too high probability of being allocated.

\label{sec:c1_and_c2_conditions}
\begin{itemize}\itemsep0pt
  \item \textbf{Condition $\mathcal{C}_1$}: There exist constant\footnote{Here constant means that the quantile satisfies $\delta \in (\delta_1, \delta_2)$ for constant $\delta_1, \delta_2 \in (0, 1)$.} $\delta \in (0, 1)$ and (not necessarily constant) $\eps \in (0, 1)$, such that for any $1 \leq k \leq \delta \cdot n$,
    \[
    \sum_{i=1}^{k} p_{i} \leq (1 - \epsilon) \cdot \frac{k}{n},
    \]
    and similarly for any $\delta \cdot n +1 \leq k \leq n$,
    \[
     \sum_{i=k}^{n} p_i \geq \left(1 + \epsilon \cdot \frac{\delta}{1-\delta} \right) \cdot \frac{n-k+1}{n}.
    \]
 
  \item \textbf{Condition $\mathcal{C}_2$}: For some $C > 1$, $\max_{i \in [n]} p_i \leq \frac{C}{n}$. 

  
\end{itemize}

In the same paper~\cite{LS22Batched} it was shown that any process with $\max_{i \in [n]} p_i \geq \frac{1+\eps}{n}$ for $\eps = \Omega(1)$ also has $\Gap(m) = \Omega(b/n)$ for any $b = \Omega(n \log n)$. Therefore, to improve on this gap, we have to consider processes with $\max_{i\in [n]} p_i = \frac{1 + o(1)}{n}$. In our analysis in \cref{sec:weak_gap,sec:strong_gap} we will make use of the following condition based on the $\ell_\infty$-distance between the probability allocation vector $p$ and the uniform distribution (i.e., \OneChoice):%

\begin{itemize}
 \item \textbf{Condition $\mathcal{C}_3$}:  There exists a $C > 1$, such that for any bin $i \in [n]$,
  \[
  \left| p_i - \frac{1}{n}\right| \leq \frac{C - 1}{n}.
  \]
\end{itemize}

\noindent Note that this condition implies condition $\mathcal{C}_2$ for the same $C > 1$, but unline $\mathcal{C}_2$ it imposes both an upper and a lower bound on the $p_i$'s. It is easy to verify that \OnePlusBeta satisfies these three conditions. 

\begin{lem} \label{lem:one_plus_beta_c123}
For any $\beta \in (0,1]$, the $(1+\beta)$ process satisfies condition $\mathcal{C}_1$ with $\delta=\frac{1}{4}$ and $\epsilon=\frac{\beta}{2}$, condition $\mathcal{C}_2$ with $C= 1+\beta$ and condition $\mathcal{C}_3$ with $C = 1 + \beta$.
\end{lem}
\begin{proof}
For any $i \in [n]$,
\[
 p_i = (1-\beta) \cdot \frac{1}{n} + \beta \cdot \frac{2i-1}{n^2}.
\]
This shows that $p_i$ is increasing in $i \in [n]$, and thus also $\max_{i \in [n]} p_i \leq \frac{1+\beta}{n}$ (condition $\mathcal{C}_2$). Further, for $\delta=1/4$,
\[
 \sum_{i = 1}^{\delta n} p_{i} \leq p_{\delta n} \cdot (\delta n) \leq (1-\beta) \cdot \delta + \beta \cdot \frac{\delta}{2} = \left(1 - \frac{\beta}{2}\right) \cdot \delta,
\]
proving that $\mathcal{C}_1$ holds with $\epsilon = \beta/2$. For the suffix sums, 
\begin{align*}
\sum_{i = k}^n p_i 
 & \stackrel{(a)}{=} \frac{n - k + 1}{n} \cdot (1-\beta) + \frac{\beta}{n^2} \cdot (n^2 - (k-1)^2) \\
 & = \frac{n - k + 1}{n} \cdot (1-\beta) + \frac{\beta}{n^2} \cdot (n - k + 1) \cdot (n + k - 1) \\
 & = \frac{n - k + 1}{n} \cdot \left(1 + \frac{\beta}{n} \cdot (k - 1) \right) \\
 & \stackrel{(b)}{\geq} \frac{n - k + 1}{n} \cdot (1 + \beta \cdot \delta ) \\
 & \stackrel{(c)}{\geq} \frac{n - k + 1}{n} \cdot \left(1 + \eps \cdot \frac{\delta}{1-\delta} \right),
\end{align*}
using in $(a)$ that $\sum_{i = 1}^u (2i - 1) = u^2$, in $(b)$ that $k \geq \delta \cdot n + 1$ and in $(c)$ that $\delta = 1/4$ and $\eps = \beta/2$.

Condition $\mathcal{C}_3$ is verified as follows. As $p_i$ is increasing in $i \in [n]$,
\[
\left\vert p_i - \frac{1}{n}\right\vert 
 \leq \max\left\{ \frac{1}{n} - p_1, p_n - \frac{1}{n} \right\} 
 = \frac{\beta}{n} - \frac{\beta}{n^2} 
 \leq \frac{\beta}{n}. \qedhere
\]
\end{proof}

Note that in contrast to \TwoChoice which satisfies $\mathcal{C}_3$ for $C = 2 - \frac{1}{n}$, by choosing $\beta$ small enough we can make the probability vector arbitrarily close to uniform. 

We also note that for any process $\mathcal{P}$ satisfying condition $\mathcal{C}_3$ for some $C > 1$, we can define a process $\mathcal{P}'$ satisfying condition $\mathcal{C}_3$ for $C' < C$ by mixing the probability vector of $\mathcal{P}$ with that of \OneChoice with probability $\eta = \frac{C' - 1}{C - 1}$. 

For instance, the $\Quantile(1/2)$ process satisfies condition $\mathcal{C}_3$ for $C = 1 + 1/2$ (since$\min_{i\in [n]} p_i = \frac{1}{2n}$ and $\max_{i \in [n]} p_i = \frac{3}{2n}$). Therefore, mixing the $\Quantile(1/2)$ with \OneChoice with probability $\eta \in [0, 1]$, gives the following probability vector satisfying condition $\mathcal{C}_3$ for $C = 1 + \eta/2$,
\[
p_i = \begin{cases}
\frac{1}{n} \cdot (1 - \eta) + \frac{1}{2n} \cdot \eta = \frac{1}{n} - \frac{\eta}{2n} & \text{if } i \leq \frac{1}{2}n, \\
\frac{1}{n} \cdot (1 - \eta) + \frac{3}{2n} \cdot \eta = \frac{1}{n} + \frac{\eta}{2n} & \text{otherwise}.
\end{cases}
\]

\begin{obs}
The process obtained by mixing $\Quantile(1/2)$ with \OneChoice satisfies condition $\mathcal{C}_1$ with $\delta = 1/2$ and $\eps = \eta/2$, condition $\mathcal{C}_2$ with $C = 1 + \eta/2$ and condition $\mathcal{C}_3$ with $C = 1 + \eta/2$.
\end{obs}

\subsection{Batched Setting and Weights}\label{sec:batched_model}

As in \cite{LS22Batched}, we now extend the definitions of \cref{sec:basic_notation} and \cref{sec:processes} to \emph{weighted balls} into bins and later to the \emph{batched setting}. To this end, let
 $w^t \geq 0$ be the weight of the $t$-th ball to be allocated for $t \geq 1$. By $W^{t}$ we denote the total weights of all balls allocated after the first $t \geq 0$ allocations, so $W^t := \sum_{i=1}^n x_i^{t} = \sum_{s=1}^t w^s$. The normalized loads are $\tilde{x}_i^{t} := x_i^t - \frac{W^t}{n}$, and with $y_i^t$ being again the decreasingly sorted, normalized load vector, we have $\Gap(t)=y_1^t$. 

The weight of each ball will be drawn  independently from a fixed distribution $\mathcal{W}$ over $[0,\infty)$. Following~\cite{PTW15}, we assume that the distribution $\mathcal{W}$ satisfies:
\begin{itemize}
  \item $\ex{\mathcal{W}} = 1$.
  \item $\ex{e^{\zeta \mathcal{W}} } < \infty $ for some constant $\zeta > 0$.
\end{itemize}
Specific examples of distributions satisfying above conditions (after scaling) are the geometric, exponential, binomial and Poisson distributions.

In the analysis we will be using the following property (see also \cite{PTW15}) and refer to these distributions as $\FiniteMgf(\zeta)$ (or $\FiniteMgf(S)$):
\begin{lem}[{\cite[Lemma 2.4]{LS22Batched}}] \label{lem:bounded_weight_moment}
There exists $S := S(\zeta) \geq \max(1, 1/\zeta)$, such that for any  $\alpha \in (0, \min(\zeta/2, 1))$ and any $\kappa \in [-1,1]$,
\[
\Ex{e^{\alpha \cdot \kappa \cdot \mathcal{W}}} \leq 1 + \alpha \cdot \kappa + S \alpha^2 \cdot \kappa^2.
\]
\end{lem}


We will now describe the allocation of weighted balls into bins using a batch size of $ b \geq n$. For the sake of concreteness, let us first describe the batched model if the allocation is done using \TwoChoice. For a given batch size consisting of $b$ consecutive balls, each ball of the batch performs the following. First, it samples two bins $i_1$ and $i_2$ and compares the load the two bins had at the beginning of the batch (let us denote the bin which has less load by $i_{\min}$). Secondly, a weight is sampled from the distribution $W$. Then a weighted ball is added to bin $i_{\min}$. Recall that since the load information is only updated at the beginning of the batch, all allocations of the $b$ balls within the same batch can be performed in parallel.

In the following, we will use a more general framework, where the process of sampling (one or more) bins and then deciding where to allocate the ball to is described by a probability vector $p$ over the $n$ bins (\cref{sec:basic_notation} and \cref{sec:processes}). Also for the analysis, it will be convenient to focus on the normalized and sorted load vector $y$, which is why the definition below is based on $y$ rather than the actual load vector $x$.

\begin{samepage}
\begin{framed}
\vspace{-.45em} \noindent
\underline{Batched Allocation with Weights}\\
\textsf{Parameters:} Batch size $b \geq n$, probability vector $p$, weight distribution $\mathcal{W}$.
\\
\textsf{Iteration:} For each $t = 0 \cdot b, 1 \cdot b, 2 \cdot b, \ldots$:
\begin{enumerate}\itemsep0pt
    \item Sample $b$ bins $i_1,i_2,\ldots,i_b$ from $[n]$ following $p$.
    \item Sample $b$ weights $w^{t+1},w^{t+2},\ldots,w^{t+b}$ from $\mathcal{W}$.
    \item Update for each bin $i \in [n]$, 
    \[
    z_{i}^{t+b}=y_{i}^{t} + \sum_{j=1}^b w^{t+j} \cdot \mathbf{1}_{i_j=i} - \frac{1}{n} \cdot \sum_{j=1}^b w^{t+j}.
    \]
    \item Let $y^{t+b}$ be the vector $z^{t+b}$, sorted decreasingly.
\end{enumerate}
\end{framed}
\end{samepage}

We also look at the version of the processes that perform random tie-breaking between bins of the same load. For $b = 1$, this makes no observable difference to the process, but for multiple steps, this effectively averages out the probability over (possibly) multiple bins that have the same load. This would, for instance, correspond to \TwoChoice, randomly deciding between the two bins if they have the same load. In particular, if $p$ is the original probability vector, then the one with random tie-breaking is $\tilde{p}(y^t)$ (for $t$ being the beginning of the batch), where
\begin{equation} \label{eq:averaging_pi}
\tilde{p}_i(y^t) := \frac{1}{|\{ j \in [n] : y_j^t = y_i^t \}|} \cdot \sum_{j \in [n] : y_j^t = y_i^t} p_j, \quad \text{for }i \in [n].
\end{equation}

\begin{samepage}
\begin{framed}
\vspace{-.45em} \noindent
\underline{Batched Allocation with Weights and Random Tie-Breaking}\\
\textsf{Parameters:} Batch size $b \geq n$, probability vector $p$, weight distribution $\mathcal{W}$.
\\
\textsf{Iteration:} For each $t = 0 \cdot b, 1 \cdot b, 2 \cdot b, \ldots$:
\begin{enumerate}\itemsep0pt
    \item Let $\tilde{p} := \tilde{p}(y^t)$ be the probability vector accounting for random tie-breaking.
    \item Sample $b$ bins $i_1,i_2,\ldots,i_b$ from $[n]$ following $\tilde{p}$.
    \item Sample $b$ weights $w^{t+1},w^{t+2},\ldots,w^{t+b}$ from $W$.
    \item Update for each bin $i \in [n]$, 
    \[
    z_{i}^{t+b}=y_{i}^{t} + \sum_{j=1}^b w^{t+j} \cdot \mathbf{1}_{i_j=i} - \frac{1}{n} \cdot \sum_{j=1}^b w^{t+j}.
    \]
    \item Let $y^{t+b}$ be the vector $z^{t+b}$, sorted decreasingly.
\end{enumerate}
\end{framed}
\end{samepage}

\section{Warm-up: \texorpdfstring{$\Oh(\sqrt{b/n} \cdot \log n)$}{O(sqrt(b/n) log n} gap} \label{sec:weak_gap}

In this section, we will refine the analysis of \cite[Section 4]{LS22Batched} to prove an $\Oh(\sqrt{(b/n)} \cdot \log n)$ bound on the gap for a family of processes. This will be used as a starting point for the analysis in \cref{sec:strong_gap}. The main theorem that we prove is the following.

\begin{thm} \label{thm:herd_weak_gap_bound}
Consider any sequential allocation process with probability allocation vector $p^t$ satisfying conditions  $\mathcal{C}_1$ for constant $\delta \in (0, 1)$ and (not necessarily constant) $\eps \in (0,1)$ as well as condition $\mathcal{C}_3$ for some $C > 1$, at every step $t \geq 0$. Further, consider the \Weighted \Batched setting with weights from a $\FiniteMgf(S)$ distribution with $S \geq 1$ and a batch size $b \geq \frac{2CS}{(C-1)^2} \cdot n$.
Then, there exists a constant $k := k(\delta) > 0$, such that for any step $m \geq 0$ being a multiple of $b$,
\[
\Pro{\max_{i \in [n]} |y_i^m| \leq k \cdot \frac{(C-1)^2}{\epsilon} \cdot \frac{b}{n} \cdot \log n } \geq 1 - n^{-2}.
\]
\end{thm}

Recall that by \cref{lem:one_plus_beta_c123} the \OnePlusBeta process satisfies condition $\mathcal{C}_1$ with $\eps = \frac{\beta}{4}$ and $\delta = \frac{1}{4}$, and conditions $\mathcal{C}_2$ and $\mathcal{C}_3$ with $C = 1 + \beta$.

In particular, by choosing $\beta = \Theta\big(\sqrt{n/b}\big)$ we get a process that is asymptotically better than \TwoChoice and which is within just a $\sqrt{\log n}$ multiplicative factor from the optimal bound for the unweighted case proven in \cref{sec:lower_bounds}.
\begin{cor}\label{cor:weak_bound}
Let $b \geq n \log n$ and consider the \Weighted \Batched setting with weights from a $\FiniteMgf(S)$ distribution with $S \geq 1$. Then, there exists a constant $k > 0$ such that for the \OnePlusBeta process with $\beta = \sqrt{4S \cdot \frac{n}{b}}$ and for any step $m \geq 0$ being a multiple of $b$,
\[
\Pro{\Gap(m) \leq k \cdot \sqrt{\frac{Sb}{n}} \cdot \log n} \geq 1 - n^{-2}.
\]
\end{cor}

The analysis is based on the \textit{hyperbolic cosine potential} which is defined for smoothing parameter $\alpha > 0$ as
\begin{align}
\Gamma^t := \Gamma^t(\alpha) := \Phi^t + \Psi^t := \sum_{i = 1}^n e^{\alpha y_i^t} + \sum_{i = 1}^n e^{-\alpha y_i^t}. \label{eq:hyperbolic}
\end{align}
We also decompose $\Gamma^t$ by defining
\[
 \Gamma_i^t := \Phi_i^t + \Psi_i^t = e^{\alpha y_i^t} + e^{-\alpha y_i^t}, \quad \text{for any bin $i \in [n]$}.
\]
Further, we use the following shorthands to denote the changes in the potentials over one step $\Delta\Phi_i^{t+1} := \Phi_i^{t+1} - \Phi_i^t$, $\Delta\Psi_i^{t+1} := \Psi_i^{t+1} - \Psi_i^{t}$ and $\Delta\Gamma_i^{t+1} := \Gamma_i^{t+1} - \Gamma_i^{t}$.

We will make use of the following drift theorem shown in \cite{LS22Batched}. Note that in the following rounds could consist of multiple single-step allocations.

\newcommand{\MainHyperbolicCosineExpectation}{
Consider any allocation process $\mathcal{P}$ and a probability vector $p^t$ satisfying condition $\mathcal{C}_1$ for some constant $\delta \in (0, 1)$ and some $\eps \in (0, 1)$ at every round $t \geq 0$. Further assume that there exist $K > 0$, $\alpha \in \big(0, \min\big\{1, \frac{\eps\delta}{8K}\big\} \big]$ and $\kappa > 0$, such that for any round $t \geq 0$, process $\mathcal{P}$ satisfies for potentials $\Phi := \Phi(\alpha)$ and $\Psi := \Psi(\alpha)$ that,
\[
\sum_{i = 1}^n \Ex{\left. \Delta\Phi_i^{t+1} \,\right|\, \mathfrak{F}^t} \leq \sum_{i = 1}^n \Phi_i^t \cdot \left(\left(p_i^t - \frac{1}{n}\right) \cdot \kappa \cdot \alpha + K \cdot \kappa \cdot \frac{\alpha^2}{n}\right),
\]
and
\[
\sum_{i = 1}^n \Ex{\left.\Delta\Psi_i^{t+1} \,\right|\, \mathfrak{F}^t} \leq  \sum_{i = 1}^n \Psi_i^t \cdot \left(\left(\frac{1}{n} - p_i^t\right) \cdot \kappa \cdot \alpha + K \cdot \kappa \cdot \frac{\alpha^2}{n}\right).
\]
Then, there exists a constant $c := c(\delta) > 0$, such that for $\Gamma := \Gamma(\alpha)$ and any round $t \geq 0$,
\[
\Ex{\left. \Delta\Gamma^{t+1} \,\right|\, \mathfrak{F}^t} \leq - \Gamma^t \cdot \kappa \cdot \frac{\alpha\eps\delta}{8n} + \kappa \cdot c\alpha\eps,
\]
and
\[
\Ex{\Gamma^t} \leq \frac{8c}{\delta} \cdot n.
\]}

\begin{thm}[{cf.~\cite[Theorem 3.1]{LS22Batched}}] \label{thm:hyperbolic_cosine_expectation}
\MainHyperbolicCosineExpectation
\end{thm}

Now we will show that any process satisfying condition $\mathcal{C}_3$, also satisfies the preconditions of \cref{thm:hyperbolic_cosine_expectation} for the expected change of the potential functions $\Phi$ and $\Psi$ over one batch.

\begin{lem} \label{lem:herd_batching_pot_changes}
Consider any sequential allocation process with probability allocation vector $p^t$ satisfying condition $\mathcal{C}_3$ for some $C \in (1, 1.9)$ at every step $t \geq 0$. Further, consider the \Weighted \Batched setting with weights from a $\FiniteMgf(S)$ distribution with constant $S \geq 1$ and a batch size $b \geq \frac{2CS}{(C-1)^2} \cdot n$. Then for $\Phi := \Phi(\alpha)$ and $\Psi := \Psi(\alpha)$ with any smoothing parameter $0 <\alpha \leq \frac{n}{2(C-1) \cdot b}$ and any step $t \geq 0$ being a multiple of $b$,
\[
\Ex{\left. \Phi^{t+b} \,\right|\, \mathfrak{F}^t} \leq \sum_{i = 1}^n \Phi_i^t \cdot \left(1 + \Big(p_i^t -\frac{1}{n}\Big) \cdot b \cdot \alpha + \frac{5(C-1)^2b}{n} \cdot b \cdot \frac{\alpha^2}{n} \right),
\]
and 
\[
\Ex{\left. \Psi^{t+b} \,\right|\, \mathfrak{F}^t} \leq \sum_{i = 1}^n \Psi_i^t \cdot \left(1 + \Big(\frac{1}{n} - p_i^t \Big) \cdot b \cdot \alpha + \frac{5(C-1)^2b}{n} \cdot b \cdot \frac{\alpha^2}{n} \right).
\]
\end{lem}

The proof proceeds in a similar manner to \cite[Lemma 4.1]{LS22Batched}, but we bound the terms in \cref{eq:u_definition} and \cref{eq:tile_u_definition} more tightly using condition $\mathcal{C}_3$.

\begin{proof}
Consider an arbitrary step $t \geq 0$ being a multiple of $b$ and for convenience let $p = p^t$. First note that the given assumptions $\alpha \leq \frac{n}{2(C-1) \cdot b}$ and $b \geq \frac{2CS}{(C-1)^2} \cdot n$ imply that 
\begin{align} \label{eq:alpha_second_bound}
\alpha \leq \frac{n}{2(C-1) \cdot b} \leq \frac{C-1}{4CS}.
\end{align}

Consider an arbitrary bin $i \in [n]$. Let $Z \in \{0,1 \}^b$ be the indicator vector, where $Z_j$ indicates whether the $j$-th ball was allocated to bin $i$. The expected change for the overload potential $\Phi_i^t$, is given by
\begin{align}
\Ex{\left. \Phi_i^{t+b} \,\right|\, \mathfrak{F}^t} 
& = \Phi_i^t \cdot \sum_{z \in \{0,1 \}^b} \Pro{Z = z} \cdot \Ex{\left. e^{\alpha \sum_{j = 1}^b \left(z_j w^{t+j} - \frac{w^{t+j}}{n} \right)} \, \right\vert \, \mathfrak{F}^t, Z = z} \notag .
 \end{align}
In the following, let us upper bound the factor of $\Phi_i^t$:
 \begin{align}
 & \sum_{z \in \{0,1 \}^b} \Pro{Z = z} \cdot \Ex{\left. e^{\alpha \sum_{j = 1}^b \left(z_j w^{t+j} - \frac{w^{t+j}}{n}\right)} \, \right\vert \, \mathfrak{F}^t, Z = z} \notag \\
 & \qquad \stackrel{(a)}{=} \!\!\! \sum_{z \in \{0,1 \}^b} \prod_{j = 1}^b (p_i)^{z_j}  (1 - p_i)^{1 - z_j}  (\ex{e^{\alpha W (1 - \frac{1}{n})}})^{z_j}  (\ex{e^{-\alpha W/n}})^{1- z_j} \notag \\
 & \qquad \stackrel{(b)}{\leq}\!\!\! \sum_{z \in \{0,1 \}^b} \prod_{j = 1}^b \left(p_i \cdot \left(1 + \alpha \cdot \left(1 - \frac{1}{n}\right) + S\alpha^2 \right)\right)^{z_j}  %
 \cdot \left((1 - p_i) \cdot \left(1 - \frac{\alpha}{n} + \frac{S\alpha^2}{n^2}\right) \right)^{1 - z_j} \notag \\
 & \qquad \stackrel{(c)}{=} \left( p_i \cdot \left(1 + \alpha \cdot \left(1 - \frac{1}{n}\right) + S\alpha^2 \right) + (1 - p_i) \cdot \left(1 - \frac{\alpha}{n} + \frac{S\alpha^2}{n^2}\right) \right)^b \notag \\
 & \qquad = \left( 1 + \alpha \cdot \left(p_i - \frac{1}{n}\right) + p_i \cdot S\alpha^2 + (1-p_i) \cdot \frac{S\alpha^2}{n^2} \right)^b \notag \\
 & \qquad \stackrel{(d)}{\leq} \left( 1 + \alpha \cdot \left(p_i - \frac{1}{n}\right) + 2 \cdot p_i \cdot S\alpha^2 \right)^b, \label{eq:phi_batched_i}
\end{align}
using in $(a)$ that the weights are independent given $\mathfrak{F}^t$, in $(b)$ \cref{lem:bounded_weight_moment} twice with $\kappa = 1 - \frac{1}{n}$ and with $\kappa = -\frac{1}{n}$ respectively (and that $(1 - 1/n)^2 \leq 1$), in $(c)$ the binomial theorem and in $(d)$ that $p_i \geq \frac{1}{n^2}$ by condition $\mathcal{C}_3$ for $C \in (1, 1.9)$. 
Let us define 
\begin{align} \label{eq:u_definition}
u := \left(p_i - \frac{1}{n}\right) \cdot \alpha + 2 \cdot p_i \cdot S\alpha^2.    
\end{align}
We will now show that $|u \cdot b| \leq 2(C-1) \cdot b \cdot \frac{\alpha}{n} \leq 1$, which holds indeed since
\begin{align}
|u \cdot b| &= \left\lvert \left(p_i -\frac{1}{n}\right) \cdot b \cdot \alpha + 2 \cdot p_i \cdot b \cdot S\alpha^2 \right\vert \notag \\
    & \leq \left\lvert \left(p_i -\frac{1}{n}\right) \cdot b \cdot \alpha\right\vert + 2 \cdot p_i \cdot b \cdot S\alpha^2 \notag \\
    & \stackrel{(a)}{\leq} \frac{C-1}{n} \cdot  b \cdot \alpha + 2 \cdot \frac{C}{n} \cdot b \cdot S \alpha^2 \notag \\
    & = ( C-1 + 2CS\alpha) \cdot b \cdot \frac{\alpha}{n} \notag \\
    & \stackrel{(b)}{\leq} 2(C-1) \cdot b \cdot \frac{\alpha}{n} \label{eq:yb_bounded_1} \\
    & \stackrel{(c)}{\leq} 1, \label{eq:yb_bounded_2}
\end{align}
using in $(a)$ that $\big|p_i - \frac{1}{n}\big| \leq \frac{C-1}{n}$ by condition $\mathcal{C}_3$, in $(b)$ that $\alpha \leq \frac{C-1}{2CS}$ and in $(c)$ that $\alpha \leq \frac{n}{2(C-1) \cdot b}$. 

Then,
\begin{align*}
\Ex{\left. \Phi_i^{t+b} \,\right|\, \mathfrak{F}^t} 
 & \stackrel{(a)}{\leq} \Phi_i^t \cdot e^{u \cdot b} \\
 & \stackrel{(b)}{\leq} \Phi_i^t \cdot \left( 1 + u \cdot b + (u \cdot b)^2 \right) \\
 & \!\! \stackrel{(\ref{eq:u_definition})}{=} \Phi_i^t \cdot \left( 1 + \left(p_i - \frac{1}{n}\right) \cdot b \cdot \alpha + 2 \cdot p_i \cdot b \cdot S\alpha^2 + (u \cdot b)^2 \right) \\
 & \!\! \stackrel{(\ref{eq:yb_bounded_1})}{\leq} \Phi_i^t \cdot \left(1 + \left(p_i -\frac{1}{n}\right) \cdot b \cdot \alpha + 2 \cdot p_i \cdot b \cdot S\alpha^2 + \left(2(C-1) \cdot b \cdot \frac{\alpha}{n} \right)^2 \right) \\
 & \stackrel{(c)}{\leq} \Phi_i^t \cdot \left(1 + \left(p_i -\frac{1}{n}\right) \cdot b \cdot \alpha + \frac{5(C-1)^2b}{n} \cdot b \cdot \frac{\alpha^2}{n} \right),
 \end{align*}
 using in $(a)$ that $1 + v \leq e^v$ for any $v$, in $(b)$ that $e^v \leq 1 + v + v^2$ for $v \leq 1.75$ and \cref{eq:yb_bounded_2}, and in $(c)$ that $\frac{(C-1)^2b}{n} \cdot b \cdot \frac{\alpha^2}{n} \geq 2 \cdot \frac{C}{n} \cdot b \cdot S \alpha^2 \geq 2 \cdot p_i \cdot b \cdot S \alpha^2$, since $b \geq \frac{2CS}{(C-1)^2} \cdot n$.
 
Similarly, for the underloaded potential $\Psi^t$, for any bin $i \in [n]$,
\begin{align*}
\Ex{\left. \Psi_i^{t+b} \,\right|\, \mathfrak{F}^t} = \Psi_i^t \cdot \sum_{z \in \{0,1 \}^b} \Pro{Z = z} \cdot \Ex{\left. e^{-\alpha \sum_{j = 1}^b \left(z_j w^{t+j} - \frac{w^{t+j}}{n} \right)} \, \right\vert \, \mathfrak{F}^t, Z = z}. 
 \end{align*}
 As before, we will upper bound the factor of $\Psi_i^t$:
 \begin{align}
& \sum_{z \in \{0,1 \}^b} \Pro{Z = z} \cdot \Ex{\left. e^{-\alpha \sum_{j = 1}^b \left(z_j w^{t+j} - \frac{w^{t+j}}{n} \right)} \, \right\vert \, \mathfrak{F}^t, Z = z} \notag \\
 & \qquad \stackrel{(a)}{=} \!\!\! \sum_{z \in \{0,1 \}^b} \prod_{j = 1}^b (p_i)^{z_j}  (1 - p_i)^{1 - z_j}  (\ex{e^{-\alpha W \cdot (1 - \frac{1}{n})}})^{z_j}  (\ex{e^{\alpha W/n}})^{1- z_j} \notag \\
& \qquad \stackrel{(b)}{\leq}  \!\!\! \sum_{z \in \{0,1 \}^b} \prod_{j = 1}^b  \left(p_i \cdot \left(1 - \alpha \cdot \left(1 - \frac{1}{n}\right) + S\alpha^2 \right)\right)^{z_j} \notag %
\cdot \left((1 - p_i) \cdot \left(1 + \frac{\alpha}{n} + \frac{S\alpha^2}{n^2}\right) \right)^{1 - z_j} \notag \\
 & \qquad \stackrel{(c)}{=} \left( p_i \cdot \left(1 - \alpha \cdot \left(1 - \frac{1}{n}\right) + S\alpha^2 \right) + (1 - p_i) \cdot \left(1 + \frac{\alpha}{n} + \frac{S\alpha^2}{n^2}\right) \right)^b \notag \\
 & \qquad = \left( 1 + \left(\frac{1}{n}- p_i\right) \cdot \alpha + p_i \cdot S\alpha^2 + (1-p_i) \cdot \frac{S\alpha^2}{n^2} \right)^b \notag \\
 & \qquad \stackrel{(d)}{\leq} \left( 1 + \left( \frac{1}{n} - p_i\right) \cdot \alpha + 2 \cdot p_i \cdot S\alpha^2 \right)^b \label{eq:psi_batched_i},
\end{align}
using in $(a)$ that the weights $W$ are independent given $\mathfrak{F}^t$, in $(b)$ \cref{lem:bounded_weight_moment} twice with $\kappa = -\big(1 - \frac{1}{n}\big)$ and with $\kappa = \frac{1}{n}$ respectively, in $(c)$ the binomial theorem and in $(d)$ that $p_i \geq \frac{1}{n^2}$ by condition $\mathcal{C}_3$ for $C \in (1, 1.9)$.
Let us define
\begin{align} \label{eq:tile_u_definition}
\tilde{u} := \left(\frac{1}{n} - p_i\right) \cdot \alpha + 2 \cdot p_i \cdot S\alpha^2.
\end{align}
Similarly, to \cref{eq:yb_bounded_2}, we get that 
\begin{align}
|\tilde{u} b| 
 & \leq \left\lvert \left(\frac{1}{n} - p_i\right) \cdot b \cdot \alpha\right\vert + 2 \cdot p_i \cdot b \cdot S\alpha^2 \leq 2(C-1) \cdot b \cdot \frac{\alpha}{n} \label{eq:tilde_u_b_1} \\
 & \leq 1. \label{eq:tilde_u_b_2}
\end{align}
So,
\begin{align*}
 \Ex{\left. \Psi_i^{t+b} \,\right|\, \mathfrak{F}^t} 
 & \stackrel{(a)}{\leq} \Psi_i^t \cdot e^{\tilde{u} b} \\
 & \stackrel{(b)}{\leq} \Psi_i^t \cdot \left( 1 + \tilde{u} b + (\tilde{u} b)^2 \right) \\
 & \!\!\stackrel{(\ref{eq:tile_u_definition})}{=} \Psi_i^t \cdot \left(1 + \left(\frac{1}{n} - p_i\right) \cdot b \cdot \alpha + 2 \cdot p_i \cdot S \alpha^2 \cdot b + (\tilde{u} \cdot b)^2 \right) \\
 & \!\!\stackrel{(\ref{eq:tilde_u_b_1})}{\leq} \Psi_i^t \cdot \left(1 + \left(\frac{1}{n} - p_i\right) \cdot b \cdot \alpha + 2 \cdot p_i \cdot b \cdot \alpha^2 + \left(2(C-1) \cdot b \cdot \frac{\alpha}{n} \right)^2 \right) \\
 & \stackrel{(c)}{\leq} \Psi_i^t \cdot \left(1 + \left(\frac{1}{n} - p_i\right) \cdot b \cdot \alpha + \frac{5 (C-1)^2 b}{n} \cdot b \cdot \frac{\alpha^2}{n} \right),
\end{align*}
using in $(a)$ that $1 + v \leq e^v$ for any $v$, in $(b)$ that $e^v \leq 1 + v + v^2$ for $v \leq 1.75$ and \cref{eq:tilde_u_b_2}, and in $(c)$ that $\frac{(C-1)^2b}{n} \cdot b \cdot \frac{\alpha^2}{n} \geq 2 \cdot \frac{C}{n} \cdot b \cdot S \alpha^2 \geq 2 \cdot p_i \cdot b \cdot S \alpha^2$, since $b \geq \frac{2CS}{(C-1)^2} \cdot n$.
\end{proof}

Having verified the preconditions for \cref{thm:hyperbolic_cosine_expectation}, we are now ready to prove the bound on the gap for this family of processes.

\begin{rem}
The same upper bound in \cref{thm:herd_weak_gap_bound} also holds for processes with tie breaks. The reason for this is that $(i)$ averaging probabilities in \cref{eq:averaging_pi} can only reduce the maximum entry (and increase the minimum) in the allocation vector $\tilde{p}^t$, i.e. $\max_{i \in [n]} \tilde{p}_i^t(x^t) \leq \max_{i \in [n]} p_i$, so it still satisfies $\mathcal{C}_3$ and $(ii)$ moving probability between bins $i, j$ with $x_i^t = x_j^t$ (and thus $\Phi_i^t = \Phi_j^t$ and $\Psi_i^t = \Psi_j^t$), implies that the aggregate upper bounds in \cref{lem:herd_batching_pot_changes} remain the same.
\end{rem}

\begin{proof}[Proof of \cref{thm:herd_weak_gap_bound}]
Consider the \Batched setting at steps that are a multiple of $b$ and rounds consisting of $b$ allocations. By \cref{lem:herd_batching_pot_changes}, the preconditions of \cref{thm:hyperbolic_cosine_expectation} are satisfied for $K := 5 \cdot (C-1)^2 \cdot \frac{b}{n}$, $\kappa := b$ and $\alpha := \frac{\eps\delta}{8K} = \frac{\eps\delta}{40 \cdot (C-1)^2 \cdot \frac{b}{n}} \leq \frac{n}{2(C-1) \cdot b}$, since $\eps \leq C - 1$ and also $\alpha \leq 1$ since $b \geq \frac{2CS}{(C-1)^2} \cdot n$, $C > 1$ and $S \geq 1$. Hence, there exists a constant $c := c(\delta) > 0$ such that for any step $m \geq 0$ which is a multiple of $b$,
\[
\Ex{\Gamma^m} \leq \frac{8c}{\delta} \cdot n.
\]
Therefore, by Markov's inequality
\[
\Pro{\Gamma^m \leq \frac{8c}{\delta} \cdot n^3} \geq 1 - n^{-2}.
\]
To prove the claim, note that when $\big\{ \Gamma^m \leq \frac{8c}{\delta} \cdot n^3 \big\}$ holds, then also,
\[
\max_{i \in [n]} |y_i^m| \leq \frac{1}{\alpha} \cdot \left( \log \left( \frac{8c}{\delta}\right) + 3 \cdot \log n \right) \leq 4 \cdot \frac{8 \cdot 5 \cdot (C-1)^2}{\eps \delta} \cdot \frac{b}{n} \cdot \log n. \qedhere
\]
\end{proof}


\section{Tight Bound: \texorpdfstring{$\Oh(\sqrt{(b/n) \cdot \log n})$}{O(sqrt((b/n) log n))} gap} \label{sec:strong_gap}

In this section, we will prove the stronger $\Oh\big(\sqrt{(b/n) \log n}\big)$ bound on the gap for a family of processes in the \Weighted \Batched setting (with $b \in [2n \log n, n^3]$). More specifically these processes are a subset of the ones analyzed in \cref{sec:weak_gap} and include the \OnePlusBeta process with $\beta = \sqrt{(n/b) \log n}$, as well as $\Quantile(1/2)$ mixed with \OneChoice. As we will show in \cref{sec:lower_bounds}, these processes achieve the asymptotically optimal bound.

\newcommand{\BatchingStrongGapBound}{
Consider the \Weighted \Batched setting with any $b \in [2n \log n, n^3]$ and weights from a $\FiniteMgf(S)$ distribution with constant $S \geq 1$. Further let $\eps = \sqrt{(n/b) \log n}$.
Consider any process with probability allocation vector $p^t$ satisfying condition $\mathcal{C}_1$ for constant $\delta \in (0, 1)$ and $\eps$ as well as condition $\mathcal{C}_3$ for $C = 1 + \eps$, at every step $t \geq 0$. 
Then, there exists a constant $\kappa := \kappa(\delta, S) > 0$, such that for any step $m \geq 0$ being a multiple of $b$,
\[
\Pro{ \max_{i \in [n]} y_i^m \leq \kappa \cdot \sqrt{\frac{b}{n} \cdot \log n} } \geq 1 - n^{-2}.
\]
}

\begin{thm}\label{thm:batching_strong_gap_bound}
\BatchingStrongGapBound
\end{thm}



There are two key steps in the proof:

\textbf{Step 1:} Similarly to the analysis in \cite{LS22Queries}, we will use two instances of the hyperbolic cosine potential, in order to show that it is concentrated at $\Oh(n)$. More specifically, we will be using $\Gamma_1 := \Gamma_1(\alpha_1)$ (defined in \cref{eq:hyperbolic}) with the smoothing parameter $\alpha_1 := \frac{\delta}{40S} \cdot \sqrt{n/(b \log n)}$ and $\Gamma_2 := \Gamma_2(\alpha_2)$ with $\alpha_2 := \frac{\alpha_1}{8 \cdot 30}$, i.e., with a smoothing parameter which is a large constant factor smaller than $\alpha_1$. So, in particular $\Gamma_2^t \leq \Gamma_1^t$ at any step $t \geq 0$. 

In the following lemma, proven in \cref{sec:batching_gamma_linear_whp}, we show that \Whp~$\Gamma_2 = \Oh(n)$ for $\log^3 n$ batches. 

\newcommand{\BatchingGammaLinearWhp}{
Consider any process satisfying the conditions in \cref{thm:batching_strong_gap_bound}. Let $\tilde{c} := 2 \cdot \frac{8c}{\delta}$ where $c := c(\delta) > 0$ is the constant from \cref{thm:hyperbolic_cosine_expectation}. Then, for any step $t \geq 0$ being a multiple of $b$,
\[
\Pro{ \bigcap_{j \in [0, \log^3 n]} \left\lbrace \Gamma_2^{t + j \cdot b} \leq \tilde{c} \cdot n \right\rbrace } \geq 1 - n^{-3}.
\]
}

\begin{lem} \label{lem:batching_gamma_linear_whp}
\BatchingGammaLinearWhp
\end{lem}


The proof follows the usual interplay between the two hyperbolic cosine potentials, in that conditioning on $\Gamma_1^t = \poly(n)$ (which follows \Whp~by the analysis in \cref{sec:weak_gap}) implies that $\big|\Delta\Gamma_2^{t+1}\big| \leq n^{1/4} \cdot \sqrt{(n/b) \cdot \log n}$ (\cref{lem:batched_gamma_1_poly_implies}~$(ii)$). This in turn allows us to apply a bounded difference inequality to prove concentration for $\Gamma_2$. In contrast to \cite{LS22Queries} and \cite{LS22Noise}, here we need a slightly different concentration inequality \cref{lem:kutlin_3_3} (also used in \cite{LS22Batched}), as in a single batch the load of a bin may change by a large amount (with small probability).
The complete proof is given in \cref{sec:batching_gamma_linear_whp}.

\textbf{Step 2:} Consider an arbitrary step $s = t + j \cdot b$ where $\{ \Gamma_2^{s} \leq \tilde{c} \cdot n \}$ holds. Then, the number of bins $i$ with load $y_i^s$ at least $z := \frac{1}{\alpha_2} \cdot \log(\tilde{c} / \delta) = \Theta(\sqrt{(b/n) \cdot \log n})$ is at most $\tilde{c}n \cdot e^{-\alpha_2 z} = \delta n$. With this in mind, we define the following potential function for any step $t \geq 0$, which only takes into account bins that are overloaded by at least $z$ balls:
\[
\Lambda^t := \Lambda^t(\lambda, z) := \sum_{i : y_i^t \geq z} \Lambda_i^t := \sum_{i : y_i^t \geq z} e^{\lambda \cdot (y_i^t - z)},
\]
where $\lambda := \frac{\eps}{4CS} = \Theta(\sqrt{(n/b) \cdot \log n})$ and we define $\Lambda_i^t = 0$ for the rest of the bins $i$. This means that when $\{ \Gamma_2^{s} \leq \tilde{c} \cdot n \}$ holds, the probability of allocating to one of these bins is $p_i^{s} \leq \frac{1-\eps}{n}$, because of the condition $\mathcal{C}_1$. Hence, the potential drops in expectation over one batch (\cref{lem:lambda_drops}) and this means that \Whp~$\Lambda^m = \poly(n)$, which implies that $\Gap(m) = \Oh(z + \lambda^{-1} \cdot \log n) = \Oh(\sqrt{(b/n) \cdot \log n})$ gap.

\subsection{Step 1: Concentration of the \texorpdfstring{$\Gamma$}{Gamma} Potential} \label{sec:batching_gamma_linear_whp}




Recall that in \cref{thm:batching_strong_gap_bound}, we considered the \Weighted \Batched setting with any $b \in [2n \log n, n^3]$ and weights sampled independently from a $\FiniteMgf(S)$ distribution with constant $S \geq 1$, for any allocation process with probability allocation vector $p^t$ satisfying condition $\mathcal{C}_1$ for constant $\delta \in (0, 1)$ and $\eps \in (0,1)$ as well as condition $\mathcal{C}_3$ for some $C > 1$, at every step $t \geq 0$. 

{\renewcommand{\thelem}{\ref{lem:batching_gamma_linear_whp}}
	\begin{lem}[Restated, page~\pageref{lem:batching_gamma_linear_whp}]
\BatchingGammaLinearWhp
	\end{lem} }
	\addtocounter{lem}{-1}

The proof of this lemma is similar to the proofs in \cite[Section 5]{LS22Batched} and \cite[Section 5]{LS22Queries}, in that we use the interplay between two instances of the hyperbolic cosine potential $\Gamma_1 := \Gamma_1 (\alpha_1)$ and $\Gamma_2 := \Gamma_2(\alpha_2)$ with smoothing parameter $\alpha_2$ being a large constant factor smaller than $\alpha_1$. More specifically, we will be working with $\alpha_1 := \frac{\delta}{40S} \cdot \sqrt{n/(b \log n)}$ and $\alpha_2 := \frac{\alpha_1}{8 \cdot 30}$.

The rest of this section is organized as follows. In \cref{sec:step_1_preliminaries}, we establish some basic properties for the potentials $\Gamma_1$ and $\Gamma_2$ and in \cref{sec:gamma_linear_whp_complete} we use these to show that \Whp~$\Gamma_2^t = \Oh(n)$ for at least $\log^3 n$ batches, and complete the proof of \cref{lem:batching_gamma_linear_whp}.


\subsubsection{Preliminaries} \label{sec:step_1_preliminaries}

We define the following event, for any step $t \geq 0$\[
\mathcal{H}^t := \left\lbrace  w^t \leq \frac{15}{\zeta} \cdot \log n \right\rbrace,
\]
which means that the weight of the ball sampled in step $t$ is $\Oh(\log n)$ (since by assumption $\zeta > 0$ is constant). By a simple Chernoff bound and a union bound, we can deduce that this holds for a $\poly(n)$-long interval.

\begin{lem}[{cf.~\cite[Lemma 5.4]{LS22Batched}}] \label{lem:many_h_i}
Consider any $\FiniteMgf(\zeta)$ distribution $\mathcal{W}$ with constant $\zeta > 0$. Then, for any steps $t_0 \geq 0$ and $t_1 \in [t_0, t_0 + n^3 \log^3 n]$, we have that
\[
\Pro{\bigcap_{s \in [t_0, t_1]} \mathcal{H}^s} \geq 1 - n^{-10}
\]
\end{lem}

We will now show that when $\Gamma_1^t = \poly(n)$ and $\mathcal{H}^t$ holds, then $\Delta\Gamma_2^{t+1}$ is small.

\begin{lem} \label{lem:batched_gamma_1_poly_implies}
Consider any process satisfying the conditions in \cref{lem:batching_gamma_linear_whp} and any step $t \geq 0$, such that $\Gamma_1^{t} \leq 2\tilde{c} \cdot n^{26}$ and $\mathcal{H}^t$ holds. Then, we have that
\begin{align*}
(i) & \qquad \Gamma_2^t \leq n^{5/4}, \\
(ii) & \qquad |\Gamma_2^{t+1} - \Gamma_2^{t} | \leq n^{1/4} \cdot \sqrt{\frac{n}{b} \cdot \log n}.
\end{align*}
Further, let $\widehat{x}^t$ be the load vector obtained by moving the $t$-th ball of the load vector $x^t$ to some other bin, then 
\begin{align*}
(iii) & \qquad \Gamma_1^t(\widehat{x}^t) \leq 2 \cdot \Gamma_1^t(x^t).
\end{align*}
\end{lem}
\begin{proof}
Recall that $\alpha_1 := \frac{\delta}{40S} \cdot \sqrt{n/(b \log n)}$ and $\alpha_2 := \frac{\alpha_1}{8 \cdot 30}$. Consider any step $t \geq 0$, such that $\Gamma_1^{t} \leq 2\tilde{c} \cdot n^{26}$ and $\mathcal{H}^t$ holds. We start by bounding the load of any bin. For any bin $i \in [n]$,
\begin{align} \label{eq:batching_gamma2_load_bound}
\Gamma_1^{t} \leq 2\tilde{c} \cdot n^{26} 
 & \Rightarrow e^{\alpha_1\cdot y_i^{t}} + e^{-\alpha_1 \cdot y_i^{t}} \leq \tilde{c} \cdot n^{26} \notag \\
 & \Rightarrow 
y_i^t \leq \frac{27}{\alpha_1} \log n \, \wedge \,
 -y_i^t \leq \frac{27}{\alpha_1} \log n,
\end{align}
where in the second implication we used $\log (2\tilde{c}) + \frac{26}{\alpha_1} \log n \leq \frac{27}{\alpha_1} \log n$, for sufficiently large $n$.

\textit{First statement.} Using \cref{eq:batching_gamma2_load_bound}, we can bound the contribution of any bin $i \in [n]$ to $\Gamma_2^t$ as follows,
\begin{equation} \label{eq:gamma_i_bound}
\Gamma_{2i}^t = e^{\alpha_2 y_i^t} + e^{-\alpha_2 y_i^t} \leq 2 \cdot e^{\alpha_2 \cdot \frac{27}{\alpha_1} \log n} \leq 2 \cdot n^{1/8},
\end{equation}
using that $\alpha_2 := \frac{\alpha_1}{8 \cdot 30}$. By aggregating, we get the first claim $\Gamma_1^t = \sum_{i = 1}^n \Gamma_{1i}^t \leq 2 \cdot n \cdot n^{1/8} \leq n^{5/4}$.


\textit{Second statement}. Consider the change for the bin $j \in [n]$ where the ball was allocated. 
Since $\alpha_2 < \frac{1}{40\cdot S \cdot \log n}$ and $S > \frac{1}{\zeta}$, we have $\alpha_2 \cdot \frac{15}{\zeta} \cdot \log n \leq 1$ and so by a Taylor estimate, $e^{\alpha_2 \cdot \frac{15}{\zeta} \cdot \log n} \leq 1 + 2 \cdot \alpha_2 \cdot \frac{15}{\zeta} \cdot \log n$. If $j \in [n]$ is an overloaded bin ($y_j^t \geq 0$), then
\begin{align*}
\left|\Delta\Gamma_{2j}^{t+1} \right| & \leq \Gamma_{2j}^t \cdot e^{\alpha_2 \cdot \frac{15}{\zeta} \cdot \log n}- \Gamma_{2j}^t \leq \Gamma_{2j}^t \cdot \Big( 1 + \alpha_2 \cdot \frac{30}{\zeta} \cdot \log n \Big)- \Gamma_{2j}^t \\
& = \Gamma_{2j}^t \cdot \alpha_2 \cdot \frac{30}{\zeta} \cdot \log n \leq n^{1/8} \cdot \sqrt{\frac{n}{b} \cdot \log n},
\end{align*}
using \cref{eq:gamma_i_bound}, $\alpha_2 \leq \frac{\delta}{40S} \cdot \sqrt{\frac{n}{b\log n}}$ and $S \geq 1/\lambda$.
Similarly, if $j$ is underloaded ($y_j^t < 0$), then
\begin{align*}
\left|\Delta\Gamma_{2j}^{t+1}\right| & \leq \Gamma_{2j}^t - \Gamma_{2j}^t \cdot e^{-\alpha_2 \cdot \frac{15}{\zeta} \cdot \log n} \leq \Gamma_{2j}^t - \Gamma_{2j}^t \cdot \Big( 1 - \alpha_2 \cdot \frac{30}{\zeta} \cdot \log n\Big) \\
 & = \Gamma_{2j}^t \cdot \alpha_2 \cdot \frac{30}{\zeta} \cdot \log n 
 \leq n^{1/8} \cdot \sqrt{\frac{n}{b} \cdot \log n}.
\end{align*}


The contribution of the rest of the bins is due to the change in the average load. More specifically, for any overloaded bin $i \in [n] \setminus \{ j \}$,
\begin{align*}
\left|\Delta\Gamma_{2i}^{t+1}\right| 
 & \leq \Gamma_{2i}^t \cdot e^{ \alpha_2 \cdot \frac{15}{\zeta} \cdot \frac{\log n}{n}}- \Gamma_{2i}^t \leq \Gamma_{2i}^t \cdot \Big( 1 + 2 \cdot \alpha_2 \cdot \frac{15}{\zeta} \cdot \frac{\log n}{n}\Big)- \Gamma_{2i}^t \\
 & = \Gamma_{2i}^t \cdot \alpha_2 \cdot \frac{30}{\zeta} \cdot \frac{\log n}{n} 
 \leq \sqrt{\frac{\log n}{bn}} \cdot n^{1/8}.
\end{align*}
Similarly, for any underloaded bin $i \in [n] \setminus \{ j \}$,
\begin{align*}
\left|\Delta\Gamma_{2i}^{t+1}\right| 
 & \leq \Gamma_{2i}^t - \Gamma_{2i}^t \cdot e^{-\alpha_2 \cdot \frac{15}{\zeta} \cdot \frac{\log n}{n}} \\
 & \leq \Gamma_{2i}^t - \Gamma_{2i}^t \cdot \Big( 1 - 2 \cdot \alpha_2 \cdot \frac{15}{\zeta} \cdot \frac{\log n}{n}\Big) \\
 & = \Gamma_{2i}^t \cdot \alpha_2 \cdot \frac{30}{\zeta} \cdot \frac{\log n}{n} 
 \leq \sqrt{\frac{\log n}{bn}} \cdot n^{1/8}.
\end{align*}
Hence, aggregating over all bins\begin{align*}
\left|\Delta\Gamma_2^{t+1}\right| 
 & \leq \left|\Delta\Gamma_{2j}^{t+1} \right| + \sum_{i \in [n] \setminus \{ j \}} \left|\Delta\Gamma_{2i}^{t+1} \right| \\
 & \leq 2 \cdot n^{1/8} \cdot \sqrt{\frac{n}{b} \cdot \log n} + n \cdot \sqrt{\frac{\log n}{bn}} \cdot n^{1/8} \\
 & \leq n^{1/4} \cdot \sqrt{\frac{n}{b} \cdot \log n},
\end{align*}
for sufficiently large $n$.

\textit{Third statement.} Let $i, j \in [n]$ be the differing bins between $x^t$ and $\widehat{x}^t$. Then since $\mathcal{H}^t$ holds, $w^t \leq \frac{15}{\zeta} \cdot \log n$, so for bin $i$,
\[
\Gamma_{1i}(\widehat{x}^t) \leq e^{\alpha_1 w^t} \cdot \Gamma_{1i}^t(x^t) \leq 2 \cdot \Gamma_{1i}^t(x^t),
\]
since $\alpha_1 < \frac{1}{40 \cdot S \cdot \log n}$ and $S > 1/\zeta$. Similarly, for bin $j$,
\[
\Gamma_{1j}(\widehat{x}^t) \leq e^{\alpha_1 w^t} \cdot \Gamma_{1j}^t(x^t) \leq 2 \cdot \Gamma_{1j}^t(x^t),
\]
Hence, 
\[
\Gamma_1^t(\widehat{x}^t) = \sum_{k = 1}^n \Gamma_{1k}^t(\widehat{x}^t) \leq \sum_{k = 1}^n 2 \cdot \Gamma_{1k}^t(x^t) = 2 \cdot \Gamma_1^t(x^t). \qedhere
\]
\end{proof}

Next, we will show that $\ex{\Gamma_2} = \Oh(n)$ and that when $\Gamma_2$ is sufficiently large, it drops in expectation over the next batch.

\begin{lem}
\label{lem:large_gamma_exponential_drop}
Consider any process satisfying the conditions in \cref{lem:batching_gamma_linear_whp}. Then, there exists a constant $\tilde{c} := \tilde{c}(\delta)$ such that for any step $t\geq 0$ being a multiple of $b$, \[
(i) \quad \ex{\Gamma_1^t} \leq \frac{\tilde{c}}{2} \cdot n,
\quad \text{ and } \quad (ii) \quad \ex{\Gamma_2^t} \leq \frac{\tilde{c}}{2} \cdot n.
\]
Further, \[
(iii) \quad \Ex{\Gamma_2^{t+b} \,\, \Big\vert\,\, \mathfrak{F}^{t},\Gamma_2^t \geq \tilde{c} \cdot n} \leq 
\Gamma_2^{t} \cdot \Big(1-\frac{1}{\log n}\Big),
\]
and 
\[
(iv) \quad \Ex{\Gamma_2^{t+b} \,\,\Big\vert\,\, \mathfrak{F}^{t},\Gamma_2^t \leq \tilde{c} \cdot n} \leq 
\tilde{c} \cdot n - \frac{n}{\log^2 n}.
\]
\end{lem}
\begin{proof}
\textit{First/Second statement.} By \cref{lem:herd_batching_pot_changes} and \cref{thm:hyperbolic_cosine_expectation} with $K := 5 \cdot (C - 1)^2 \cdot \frac{b}{n}$, $\alpha_1 = \frac{\delta}{40S} \cdot \sqrt{\frac{n}{b \log n}} \leq \frac{\eps\delta}{8K}$, since $\alpha_1 \leq \frac{1}{2} \cdot \sqrt{\frac{n}{b \log n}} = \frac{n}{2 \cdot (C-1) \cdot b}$ and $\frac{2CS}{(C-1)^2} \cdot n = 2CS \cdot \frac{b}{\log n} \leq b$, we get the conclusion by setting $\tilde{c} := 16 c/\delta$, for $c := c(\delta) > 0$ the constant in \cref{thm:hyperbolic_cosine_expectation}.

Similarly for the potential $\Gamma_2$ since $\alpha_2 \leq \alpha_1$. 

\textit{Third statement.} Furthermore, by \cref{lem:herd_batching_pot_changes} and \cref{thm:hyperbolic_cosine_expectation}, we also get that for any $t \geq 0$,
\begin{equation} \label{eq:tilde_gamma_drop}
\Ex{\left. \Gamma_2^{t+b} \,\,\right|\,\, \mathfrak{F}^t} \leq \Gamma_2^t \cdot \Big(1 - b \cdot \frac{\eps\delta}{8n} \cdot \alpha_2\Big) + b \cdot c\alpha_2\eps.
\end{equation}
We define the constant
\begin{align*}
\tilde{c}_1 
 & := \frac{1}{2} \cdot b \cdot \frac{\eps\delta}{8n} \cdot \alpha_2 
 = b \cdot \frac{\delta^2}{16 \cdot 8 \cdot 30 \cdot 40S \cdot n} \cdot \sqrt{\frac{n}{b \log n}} \cdot \sqrt{\frac{n \log n}{b}} \\
 & = \frac{\delta^2}{16 \cdot 8 \cdot 30 \cdot 40 \cdot S}.
\end{align*}
When $\big\{ \Gamma_2^{t} \geq \tilde{c} \cdot n\big \}$ holds, then \cref{eq:tilde_gamma_drop} yields,
\begin{align*}
\Ex{\Gamma_2^{t+b} \,\Big\vert\, \mathfrak{F}^t, \Gamma_2^{t} \geq \tilde{c} \cdot n} 
 & \leq \Gamma_2^{t} \cdot \Big(1 - 2 \cdot \tilde{c}_1 \Big) +  b \cdot c\alpha_2\eps \\
 & = \Gamma_2^{t} - \tilde{c}_1 \cdot \Gamma_2^{t} + \Big(b \cdot c\alpha_2\eps - \tilde{c}_1 \cdot \Gamma_2^{t}\Big)
 \\  
 & = \Gamma_2^{t} - \tilde{c}_1 \cdot \Gamma_2^{t} + \Big(b \cdot c\alpha_2\eps- \frac{1}{2} \cdot b \cdot \frac{\eps\delta}{8n} \cdot \alpha_2 \cdot \frac{16 c}{\delta} \cdot n \Big) \\ 
 & \leq \Big(1-\frac{1}{\log n} \Big) \cdot \Gamma_2^{t}.
\end{align*}
\textit{Fourth statement.} Similarly, when $\Gamma_1^t < \tilde{c} \cdot n$, \cref{eq:tilde_gamma_drop} yields,
\begin{align*}
\Ex{\Gamma_2^{t+b} \,\Big\vert\, \mathfrak{F}^t, \Gamma_2^{t} < \tilde{c} \cdot n} 
 & \leq \tilde{c} \cdot n \cdot \Big(1 - 2 \cdot \tilde{c}_1\Big) + b \cdot c\alpha_2\eps \\
 & = \tilde{c} \cdot n - \tilde{c} \cdot \tilde{c}_1 \cdot n + \Big(b \cdot c\alpha_2\eps- \tilde{c} \cdot \tilde{c}_1 \cdot n \Big)
 \\ 
 &= \tilde{c} \cdot n - \frac{\tilde{c}}{\log n} \cdot n \leq \tilde{c} \cdot n - \frac{n}{\log^2 n}. \qedhere
\end{align*}
\end{proof}




In the next lemma, we show that \Whp~$\Gamma_1$ is $\poly(n)$ for every step in an interval of length $2 b \log^3 n$.

\begin{lem} \label{lem:gamma_continuous}
Let $\tilde{c} := 2 \cdot \frac{8c}{\delta}$ be the constant defined in \cref{lem:large_gamma_exponential_drop}. For any $2n \log n \leq b \leq n^3$ and for any step $t \geq 0$ being a multiple of $b$,
\[
\Pro{ \bigcap_{s \in [t, t + 2b \log^ 3 n]} \left\{ \Gamma_1^{s} \leq \tilde{c} \cdot n^{26} \right\} } \geq 1 - n^{-10}.
\]
\end{lem}
\begin{proof}
We will start by bounding $\Gamma_1^s$ at steps $s$ being a multiple of $b$. Using \cref{lem:large_gamma_exponential_drop}~$(i)$, Markov's inequality and the union bound, we have for any $t \geq 0$,
\begin{equation} \label{eq:base_union_bound}
\Pro{ \bigcap_{s \in [0, 2 \log^ 3 n]} \left\{ \Gamma_1^{t + s \cdot b} \leq \tilde{c} \cdot n^{12} \right\} } \geq 1 - \frac{2\log^3 n}{n^{11}}.
\end{equation}
Now, given that $\Gamma_1^{t + s \cdot b} \leq \tilde{c} \cdot n^{12}$, we will upper bound $\Gamma_1$ for the steps in between, i.e., for $\Gamma_1^{t + s \cdot b + r}$ for any $r \in [0, b)$. To this end, recalling that $\Gamma_{1i}^{t + s \cdot b + r} := \Phi_{1i}^{t + s \cdot b + r} + \Psi_{1i}^{t + s \cdot b + r}$, we will upper bound for each bin $i \in [n]$ the terms $\Phi_{1i}^{t + s \cdot b + r}$ and $\Psi_{1i}^{t + s \cdot b + r}$ separately. Proceeding using \cref{eq:phi_batched_i} in \cref{lem:herd_batching_pot_changes} (since $\alpha_1 \leq 1$ and $p$ satisfies condition $\mathcal{C}_3$),
\begin{align*}
\Ex{\left. \Phi_{1i}^{t + s \cdot b + r} \,\right|\, \mathfrak{F}^{t + s \cdot b}, \Phi_{1i}^{t + s \cdot b}}
 & \leq  \Phi_{1i}^{t + s \cdot b} \cdot \left( 1 + \left(p_i - \frac{1}{n}\right) \cdot \alpha_1 + 2 \cdot p_i \cdot S\alpha_1^2 \right)^r \\ 
 & \stackrel{(a)}{\leq} \Phi_{1i}^{t + s \cdot b} \cdot \left( 1 + \frac{C-1}{n} \cdot \alpha_1 + 2 \cdot \frac{C}{n} \cdot S\alpha_1^2 \right)^r \\ 
 & \stackrel{(b)}{\leq} \Phi_{1i}^{t + s \cdot b} \cdot \left( 1 + 2 \cdot \frac{C-1}{n} \cdot \alpha_1 \right)^r \\ 
 & \leq \Phi_{1i}^{t + s \cdot b} \cdot e^{2\alpha_1 (C-1) \cdot \frac{r}{n}} \leq \Phi_{1i}^{t + s \cdot b} \cdot e^{2\alpha_1 (C-1) \cdot \frac{b}{n}} \\
 & \stackrel{(c)}{\leq} 2 \cdot \Phi_{1i}^{t + s \cdot b},
\end{align*}
using in $(a)$ that $p_i - \frac{1}{n} \leq \frac{C - 1}{n}$ by condition $\mathcal{C}_3$, $(b)$ that $\alpha_1 \leq \frac{C-1}{2CS}$ (as in \cref{eq:alpha_second_bound}) and in $(c)$~that $\alpha_1 \leq \frac{\delta}{40} \cdot \sqrt{\frac{n}{b \log n}}$ and $C - 1 = \sqrt{\frac{n \log n}{b}}$. Similarly, using \cref{eq:psi_batched_i} in \cref{lem:herd_batching_pot_changes},
\begin{align*}
\Ex{\left. \Psi_{1i}^{t + s \cdot b + r} \,\right|\, \mathfrak{F}^{t + s \cdot b}, \Phi_{1i}^{t + s \cdot b}}
 & \leq \Psi_{1i}^{t + s \cdot b} \cdot \left( 1 + \left( \frac{1}{n} - p_i\right) \cdot \alpha_1 + 2 \cdot p_i \cdot S\alpha_1^2 \right)^r \\ 
 & \stackrel{(a)}{\leq} \Psi_{1i}^{t + s \cdot b} \cdot \left( 1 + \frac{C-1}{n} \cdot \alpha_1 + 2 \cdot \frac{C}{\alpha_1} \cdot S\alpha_1^2 \right)^r \\ 
 & \stackrel{(b)}{\leq} \Psi_{1i}^{t + s \cdot b} \cdot \left( 1 + 2 \cdot \frac{C - 1}{n} \cdot \alpha_1 \right)^r \\ 
 & \leq \Psi_{1i}^{t + s \cdot b} \cdot e^{2\alpha (C-1) \cdot \frac{r}{n}} \leq \Psi_{1i}^{t + s \cdot b} \cdot e^{2\alpha_1 (C-1) \cdot \frac{b}{n}} \\
 & \stackrel{(c)}{\leq} 2 \cdot \Psi_{1i}^{t + s \cdot b},
\end{align*}
using in $(a)$ that $\frac{1}{n} - p_i \leq \frac{C - 1}{n}$ by condition $\mathcal{C}_3$, $(b)$ that $\alpha_1 \leq \frac{C - 1}{2CS}$ and in $(c)$ that $\alpha_1 \leq \frac{\delta}{40} \cdot \sqrt{\frac{n}{b \log n}}$ and $C - 1 = \sqrt{\frac{n \log n}{b}}$. Hence, combining and aggregating over the bins,
\[
\Ex{\left. \Gamma_1^{t + s \cdot b + r} \, \right| \, \mathfrak{F}^{t + s \cdot b}, \Gamma_1^{t + s \cdot b}} \leq 2 \cdot \Gamma_1^{t + s \cdot b}.
\]
Applying Markov's inequality, for any $r \in [0, b)$,
\[
\Pro{\Gamma_1^{t + s \cdot b + r} \leq n^{14} \cdot \Gamma_1^{t + s \cdot b}} \geq 1 - 2 \cdot n^{-14}.
\]
Hence, by a union bound over the $2b \cdot \log^3 n \leq 2 \cdot n^3 \cdot \log^3 n$ possible rounds (since $b \leq n^3$) for $s \in [0, 2\log^3 n]$ and $r \in [0, b)$,%
\begin{align}
\Pro{ \bigcap_{r \in [0, b]}\bigcap_{s\in [0, 2\log^3 n]} \left\{ \Gamma_1^{t + s \cdot b + r} \leq n^{14} \cdot \Gamma_1^{t + s \cdot b} \right\} } 
 \geq 1 - 2 \cdot n^{-14} \cdot 2 b \log^3 n \geq 1 - \frac{1}{2} \cdot n^{-10}. \label{eq:double_intersection_lb}
\end{align}
Finally, taking the union bound of \cref{eq:base_union_bound} and \cref{eq:double_intersection_lb}, we conclude
\begin{align*}
\Pro{ \bigcap_{s \in [t, t + 2b \log^ 3 n]} \left\{ \Gamma_1^{s} \leq \tilde{c} \cdot n^{26} \right\} }
& \geq \Pr\Bigg[ \bigcap_{r \in [0, b]}\bigcap_{s\in [0, 2\log^3 n]} \left\{ \Gamma_1^{t + s \cdot b + r} \leq n^{14} \cdot \Gamma_1^{t + s \cdot b} \right\} \\
& \qquad \qquad \cap \bigcap_{s \in [0, 2 \log^ 3 n]} \left\{ \Gamma_1^{t + s \cdot b} \leq \tilde{c} \cdot n^{12} \right\} \Bigg]\\
 & \geq 1 - \frac{1}{2} \cdot n^{-10} - \frac{2\log^3 n}{n^{11}} \geq 1 - n^{-10}. \qedhere
\end{align*}
\end{proof}

We will now show that \Whp~there is a step every $b\log^3 n$ steps, such that the exponential potential $\Gamma_2$ becomes $\Oh(n)$. We call this the \textit{recovery} phase.

\begin{lem}[Recovery] \label{lem:batched_gamma_1_poly_n_implies_gamma_2_linear_whp}
Consider any process satisfying the conditions in \cref{lem:batching_gamma_linear_whp}. Let $\tilde{c} := 2 \cdot \frac{8c}{\delta}$ be the constant defined in \cref{lem:large_gamma_exponential_drop}. For any step $t \geq 0$ being a multiple of $b$,
\[
\Pro{\bigcup_{s \in [0, b \log^3 n]} \left\lbrace \Gamma_2^{t + s \cdot b} \leq \tilde{c} \cdot n \right\rbrace} \geq 1 - 2 \cdot n^{-8}.
\]
\end{lem}

\begin{proof}
By \cref{lem:large_gamma_exponential_drop}~$(ii)$, using Markov's inequality at step $t$ being a multiple of $b$, we have
\begin{equation} \label{eq:basic_markov_tilde_gamma}
\Pro{\Gamma_2^{t} \leq \tilde{c} \cdot n^9} \geq 1 - n^{-8}.
\end{equation}
We will be assuming $\Gamma_2^{t} \leq \tilde{c} \cdot n^9$. By \cref{lem:large_gamma_exponential_drop}~$(iii)$, for any step $r \geq 0$, then \[
\Ex{\left. \Gamma_2^{r+1} \,\right|\, \mathfrak{F}^{r}, \Gamma_2^{r} > \tilde{c} \cdot n} \leq 
\Gamma_2^{r} \cdot \Big(1-\frac{1}{\log n}\Big).
\]
In order to prove that $\Gamma_2^{t + s \cdot b}$ is small for some $s \in [0, b \log^3 n]$, we define the ``killed'' potential function for any $r \in [0, \log^3 n]$,
\[
\widehat{\Gamma}_2^{t + r\cdot b} := \Gamma_2^{t + r \cdot b} \cdot \mathbf{1}_{\bigcap_{ s \in [0, r]} \{ \Gamma_2^{t + s \cdot b} > \tilde{c} \cdot n\} }.
\]
Note that $\widehat{\Gamma}_2^{t + r \cdot b} \leq \Gamma_2^{t + r \cdot b}$ and that $\big\{ \widehat{\Gamma}_2^{t + r \cdot b} = 0 \big\}$ implies that $\big\{ \widehat{\Gamma}_2^{t + r \cdot b + 1} = 0 \big\}$. Hence, the $\widehat{\Gamma}$ potential satisfies unconditionally the drop inequality of \cref{lem:large_gamma_exponential_drop}~$(iii)$, that is,
\[
\Ex{\left. \widehat{\Gamma}_2^{t + (r+1) \cdot b} \,\right|\, \mathfrak{F}^{t + r \cdot b}, \widehat{\Gamma}_2^{t + r \cdot b}} 
 \leq \widehat{\Gamma}_2^{t + r \cdot b} \cdot \Big(1-\frac{1}{\log n}\Big).
\]
Inductively applying this for $\log^3 n$ batches, %
\[
\Ex{\left. \widehat{\Gamma}_2^{t + (\log^3 n) \cdot b} \,\, \right\vert \,\, \mathfrak{F}^{t}, \Gamma_2^{t} \leq \tilde{c} \cdot n^9} 
 \leq \Gamma_2^{t} \cdot \Big(1-\frac{1}{\log n}\Big)^{\log^3 n} 
  \leq e^{-\log^2 n} \cdot \tilde{c} \cdot n^{9} < n^{-7}. 
\]
So by Markov's inequality, \[
 \Pro{\left. \widehat{\Gamma}_2^{t + (\log^3 n) \cdot b} < n  \,\,\right|\,\, \Gamma_2^{t} \leq \tilde{c} \cdot n^9} \geq 1 - n^{-{8}}
\]
By combining with \cref{eq:basic_markov_tilde_gamma},
\begin{align*}
\Pro{\widehat{\Gamma}_2^{t + (\log^3 n) \cdot b} < n}
 & \geq \left( 1 - n^{_8}\right) \cdot \left( 1 - n^{_8}\right) \geq 1 - 2\cdot n^{-8}.
\end{align*}
Due to the definition of $\Gamma_2$, at any step $t \geq 0$, deterministically $\Gamma_2^t \geq 2n$. So,
we conclude that~w.p.~at least $1 - 2 \cdot n^{-8}$, we have that $\widehat{\Gamma}_2^{t + (\log^3 n) \cdot b} = 0$ or equivalently the event 
\[
\neg\bigcap_{ s \in [0, \log^3 n]} \left\{ \Gamma_2^{t + s \cdot b} > \tilde{c} \cdot n \right\},
\]
holds, which implies the conclusion.
\end{proof}

\subsubsection{Completing the Proof of Lemma~\ref{lem:batching_gamma_linear_whp}}\label{sec:gamma_linear_whp_complete}

We are now ready to prove \cref{lem:batching_gamma_linear_whp}, using a method of bounded differences with a bad event \cref{lem:kutlin_3_3} (\cite[Theorem 3.3]{K02}).

\begin{proof}[Proof of \cref{lem:batching_gamma_linear_whp}]

Our starting point is to apply
 \cref{lem:batched_gamma_1_poly_n_implies_gamma_2_linear_whp}, which proves that there is at least one step $t + \rho \cdot b \in [t - b\log^3 n, t]$ with $\rho \in [-\log^3 n, 0]$ such that the potential $\Gamma_2$ is small,
 \begin{align} \label{eq:starting_point}
\Pro{\bigcup_{\rho \in [- \log^3 n, 0]} \left\lbrace \Gamma_2^{t + \rho \cdot b} \leq \tilde{c} \cdot n \right\rbrace } &\geq 1 - 2 \cdot n^{-8}.
 \end{align}
Note that if $t < b \cdot \log^3 n$, then deterministically $\Gamma_2^0 = 2n \leq \tilde{c} \cdot n$ (which corresponds to $\rho = -t/b$).

We are now going to apply the concentration inequality \cref{lem:kutlin_3_3} to each of the batches starting at $t + \rho \cdot b, \ldots, t + (\log^3 n) \cdot b$ and show that the potential remains $ \leq \tilde{c} \cdot n$ at the last step of each batch. More specifically, we will show that for any $\tilde{r} \in [\rho, \log^3 n]$, for $r = t + b \cdot \tilde{r}$,
\[
\Pro{\left. \Gamma_2^{r+b} > \tilde{c} \cdot n \,\right|\, \mathfrak{F}^r, \Gamma_2^r \leq \tilde{c} \cdot n} \leq 3 \cdot n^{-4}.
\]


We will show this by applying \cref{lem:kutlin_3_3} for all steps of the batch $[r, r + b]$. We define the good event \[
\mathcal{G}_r := \mathcal{G}_r^{r+b} := \bigcap_{s \in [r, r + b]} \left( \left\{ \Gamma_1^s \leq \tilde{c} \cdot n^{26} \right\} \cap \mathcal{H}^s \right),
\] 
and the bad event $\mathcal{B}_r := (\mathcal{G}_r)^c$. Using a union bound over \cref{lem:many_h_i} and \cref{lem:gamma_continuous},
\begin{align} \label{eq:bad_event_union_bound}
\Pro{ \bigcap_{s \in [t-b \log^3 n, t+ b \log^ 3 n]} \left( \left\{ \Gamma_1^{s} \leq \tilde{c} \cdot n^{26} \right\} \cap \mathcal{H}^s \right) } \geq 1 - 2n^{-10}.
\end{align}

Consider any $u \in [r, r + b]$. Further, we define the slightly weaker good event, $\tilde{\mathcal{G}}_r^u := \bigcap_{s \in [r, u]} \left( \left\{ \Gamma_1^s \leq 2\tilde{c} \cdot n^{26} \right\} \cap \mathcal{H}^s \right)$ and the ``killed'' potential, %
\[
\widehat{\Gamma}_r^u := \Gamma_2^u \cdot \mathbf{1}_{\tilde{\mathcal{G}}_r^u}.
\]
We will show that the sequence $\widehat{\Gamma}_r^r, \ldots , \widehat{\Gamma}_r^{r + b}$ is strongly difference-bounded by $(n^{5/4}, n^{1/4} \cdot \sqrt{(n/b) \cdot \log n}, 2 \cdot n^{-10})$ (\cref{def:strongly_dif_bounded}).

Let $\omega \in [n]^{b}$ be an allocation vector encoding the allocations made in $[r, r + b]$. Let $\omega'$ be an allocating vector resulting from $\omega$ by changing one arbitrary allocation. It follows that,
\begin{align*}
\left| \widehat{\Gamma}_r^{r+b}(\omega) - \widehat{\Gamma}_r^{r+b}(\omega') \right| &\leq \max_{\tilde{\omega}} \widehat{\Gamma}_r^{r+b}(\tilde{\omega}) - \min_{\tilde{\omega}} \widehat{\Gamma}_r^{r+b}(\tilde{\omega}) \\
&\leq \max_{\tilde{\omega} \in \tilde{\mathcal{G}}_{r}^{r+b}} \Gamma_{2}^{r+b}(\tilde{\omega}) -0 
\\ & \leq n^{5/4},
\end{align*}
where in the last inequality we used \cref{lem:batched_gamma_1_poly_implies}~$(i)$ that for any $\tilde{\omega} \in \tilde{\mathcal{G}}_r^{r+b}$, we have $\widehat{\Gamma}_r^{r+b}(\tilde{\omega}) \leq \Gamma_{2}^{r+b}(\tilde{\omega}) \leq n^{5/4}$.

We will now derive a refined bound by additionally assuming that $\omega \in \mathcal{G}_r$. Then, for any $u \in [r, r + b]$,
\[
\Gamma_1^{r+u}(\omega') \leq 2 \cdot \Gamma_1^{r+u}(\omega) \leq 2 \tilde{c} \cdot n^{26},
\]
where the first inequality is by \cref{lem:batched_gamma_1_poly_implies}~$(iii)$. 
Hence $\omega' \in \tilde{\mathcal{G}}_r^{r+b}$, so $\mathbf{1}_{\tilde{\mathcal{G}}_r^{r+b}(\omega')} = 1$ and  $\widehat{\Gamma}_r^{r+b}(\omega') 
= \Gamma_{2r}^{r+b}(\omega')$. Similarly, for $\omega \in \mathcal{G}_r \subseteq \tilde{\mathcal{G}}_r^{r+b}$, we have $\widehat{\Gamma}_r^{r+b}(\omega) 
= \Gamma_{2r}^{r+b}(\omega)$ and by \cref{lem:batched_gamma_1_poly_implies}~$(ii)$,
\[ 
\left|\widehat{\Gamma}_r^{r+b}(\omega) - \widehat{\Gamma}_r^{r+b}(\omega')\right| = \left| \Gamma_2^{r+b}(\omega) - \Gamma_2^{r+b}(\omega') \right| \leq n^{1/4} \cdot \sqrt{\frac{n}{b} \cdot \log n}.
\]




Within a single batch all allocations are independent, so we apply \cref{lem:kutlin_3_3}, choosing $\gamma_k := \frac{1}{b}$ and $N := b$, which states that for any $T > 0$ and $\mu := \Ex{\left. \widehat{\Gamma}_r^{r + b} > \mu + T \,\right|\, \mathfrak{F}^r, \Gamma_2^r \leq \tilde{c} \cdot n}$,
\begin{align*}
& \Pro{\left. \widehat{\Gamma}_r^{r + b} > \mu + T \, \right|\, \mathfrak{F}^r, \Gamma_2^r \leq \tilde{c} \cdot n} \\
 & \quad  \leq \exp\left( - \frac{T^2}{2 \cdot \sum_{k = 1}^{b} (n^{1/4} \cdot \sqrt{\frac{n}{b} \cdot \log n} + n^{5/4} \cdot \frac{1}{b})^2 }\right) + 2 \cdot n^{-10} \cdot \sum_{k = 1}^b b. 
\end{align*}
By \cref{lem:large_gamma_exponential_drop}~$(iv)$, we have $\mu \leq \ex{\widehat{\Gamma}_r^{r+b} \mid \Gamma_2^r < \tilde{c} \cdot n}  \leq \ex{\Gamma_{2}^{r+b} \mid \Gamma_2^r < \tilde{c} \cdot n} \leq \tilde{c} \cdot n - n/\log^2 n$. Hence, for $T := n / \log^2 n$, since $2n \log n \leq b \leq n^3$, we have
\begin{align*}
\Pro{\left. \widehat{\Gamma}_r^{r+b} > \tilde{c} \cdot n \, \right|\, \mathfrak{F}^r, \Gamma_2^r \leq  \tilde{c} \cdot n}
& \leq \exp\left( - \frac{n^2/\log^4 n}{2 \cdot b \cdot (2 \cdot n^{1/4} \cdot \sqrt{\frac{n}{b} \cdot \log n})^2 }\right) + 2n^{-10} \cdot b^2 \\
& \leq \exp\left( - \frac{n^{1/2}}{8 \cdot \log^5 n} \right) + 2n^{-10} \cdot n^6 \leq 3 \cdot n^{-4}. 
\end{align*}
Let $\mathcal{K}_{\rho}^{\tilde{r}} := \mathcal{G}_{\rho}^{t + \tilde{r} \cdot b} \cap \{ \Gamma_2^{t + \rho \cdot b} \leq \tilde{c} \cdot n \}$ for $\tilde{r} \in [\rho, \log^3 n]$. For any $\tilde{r} \geq \rho$, since $\mathcal{K}_\rho^{\tilde{r}+1} \subseteq \mathcal{K}_\rho^{\tilde{r}}$, we have%
\begin{align} \label{eq:k_killed}
\Pro{\left. \widehat{\Gamma}_r^{t + (\tilde{r}+1) \cdot b} \cdot \mathbf{1}_{\mathcal{K}_\rho^{\tilde{r}+1}} >  \tilde{c} \cdot n \,\right|\, \mathfrak{F}^r, \widehat{\Gamma}_r^{t + \tilde{r} \cdot b} \cdot \mathbf{1}_{\mathcal{K}_\rho^{\tilde{r}}} \leq  \tilde{c} \cdot n} \leq 3 \cdot n^{-4}.
\end{align}
By union bound of \cref{eq:starting_point} and \cref{eq:bad_event_union_bound}, 
\begin{align} 
\Pro{\bigcup_{\rho \in [-\log^3 n]} \mathcal{K}_{\rho}^{\log^3 n}} 
 & \geq \Pro{\mathcal{G}_{-\log^3 n}^{\log^3 n} \cap \bigcup_{\rho \in [- \log^3 n, 0]} \left\lbrace \Gamma_2^{t + \rho \cdot b} \leq \tilde{c} \cdot n \right\rbrace} \notag \\
 & \geq 1 - 2 \cdot n^{-8} - 2\cdot n^{-10} \geq 1 - 3 \cdot n^{-8}.\label{eq:exists_k_event}
\end{align}
Let \[
\mathcal{A} := \bigcap_{\tilde{r} \in [0, \log^3 n]} \left\lbrace \Gamma_2^{t + \tilde{r} \cdot b} \leq \tilde{c} \cdot n \right\rbrace\] and \[\mathcal{A}_{\rho} := \bigcap_{\tilde{r} \in [\rho, \log^3 n]} \left\lbrace \widehat{\Gamma}_r^{t + \tilde{r} \cdot b} \cdot \mathbf{1}_{\mathcal{K}_\rho^{\tilde{r}}} \leq \tilde{c} \cdot n \right\rbrace\]. Then, 
\begin{align*}
\Pro{\mathcal{A}_\rho \,\,\left|\,\, \Gamma_2^{t + \rho \cdot b} \leq \tilde{c} \cdot n \right.} 
  & \geq \prod_{\tilde{r} \in [\rho, \log^3 n - 1]} \mathbf{Pr} \left[ \bigcap_{\tilde{s} \in [\rho+1, \tilde{r}+1]} \left\lbrace \widehat{\Gamma}_r^{t + \tilde{s} \cdot b} \cdot \mathbf{1}_{\mathcal{K}_\rho^{\tilde{s}}} \leq \tilde{c} \cdot n \right\rbrace \right. \\
  &  \qquad \qquad  \left. \bigg\vert \, \bigcap_{\tilde{s} \in [\rho+1, \tilde{r} - 1]} \left\lbrace \widehat{\Gamma}_r^{t + \tilde{s} \cdot b} \cdot \mathbf{1}_{\mathcal{K}_\rho^{\tilde{s}}} \leq \tilde{c} \cdot n \right\rbrace, \widehat{\Gamma}_r^{t + \tilde{r} \cdot b} \cdot \mathbf{1}_{\mathcal{K}_\rho^{\tilde{s}}} \leq \tilde{c} \cdot n\right] \\
 & \geq \prod_{\tilde{r} \in [\rho, \log^3 n - 1]} \Pro{\left. \widehat{\Gamma}_r^{\tilde{r}+b} \cdot \mathbf{1}_{\mathcal{K}_\rho^{\tilde{r}+1}} > \tilde{c} \cdot n \,\right|\, \mathfrak{F}^{t + \tilde{r} \cdot b}, \widehat{\Gamma}_r^{t + \tilde{r} \cdot b} \cdot \mathbf{1}_{\mathcal{K}_\rho^{\tilde{r}}} \leq  \tilde{c} \cdot n} \\
 & \geq (1 - 3n^{-4})^{2\log^3 n} \geq 1 - 6 \cdot n^{-4} \cdot \log^3 n,
\end{align*}
where in the last inequality we have used \cref{eq:k_killed} and the fact $\rho \geq -\log^3 n$. So,
\begin{align}
\Pro{\mathcal{A}_\rho} 
 & = \Pro{\mathcal{A}_\rho \,\left|\,\, \Gamma_2^{t + \rho \cdot b} \leq \tilde{c} \cdot n \right.} \cdot \Pro{\Gamma_2^{t + \rho \cdot b} \leq \tilde{c} \cdot n} + 1 \cdot \Pro{\neg \left\{ \Gamma_2^{t + \rho \cdot b} \leq \tilde{c} \cdot n \right\}} \notag \\
 & \geq 1 - 6 \cdot n^{-4} \cdot \log^3 n. \label{eq:event_a_rho} 
\end{align}
Note that for any $\rho \in [-\log^3 n, 0]$, we have that $\mathcal{A}_\rho \cap \mathcal{K}_\rho^{\log^3 n} \subseteq \mathcal{A}$. Hence we conclude by the union bound of \cref{eq:exists_k_event} and \cref{eq:event_a_rho}, that
\begin{align*}
\Pro{\mathcal{A}} 
 & \geq \Pro{\bigcup_{\rho \in [-\log^3 n, 0]} \mathcal{K}_{\rho}^{\log^3 n} \cap \bigcap_{\rho \in [-\log^3 n, 0]} \mathcal{A}_\rho}  \geq 1  - 3 \cdot n^{-8} - 6 \cdot n^{-4} \cdot \log^6 n \geq 1- n^{-3}. \qedhere
\end{align*}
\end{proof}

\subsection{Step 2: Completing the proof of Theorem~\ref{thm:batching_strong_gap_bound}}\label{sec:step_two}

We will now show that when $\Gamma_2^t = \Oh(n)$, the stronger potential function $\Lambda^t$ drops in expectation over the next batch. This will allow us to prove that $\Lambda^m = \poly(n)$ and deduce that \Whp~$\Gap(m) = \Oh\big(\sqrt{(b/n) \cdot \log n}\big)$.

\begin{lem} \label{lem:lambda_drops}
Consider any process satisfying the conditions in \cref{thm:batching_strong_gap_bound}. Let $\tilde{c} := 2 \cdot \frac{8c}{\delta}$ where $c := c(\delta) > 0$ is the constant from \cref{thm:hyperbolic_cosine_expectation}. For any step $t \geq 0$ being a multiple of $b$,
\[
\Ex{\Lambda^{t+b} \,\left|\, \mathfrak{F}^t, \Gamma_2^t \leq \tilde{c} \cdot n \right.} \leq \Lambda^t \cdot e^{-\frac{\lambda\eps}{2n} \cdot b} + n^2.
\]
\end{lem}
\begin{proof}
Consider an arbitrary step $t \geq 0$ being a multiple of $b$ and consider a labeling of the bins so that they are sorted by load. Assuming that $\{ \Gamma_2^t \leq \tilde{c} \cdot n \}$ holds, the number of bins with load $y_i^t \geq z$ is at most 
\[
\tilde{c} \cdot n \cdot e^{- \alpha_2 \cdot z} = \tilde{c} \cdot n \cdot e^{-\log(\tilde{c}/\delta)} = \delta \cdot n.
\]
For any bin $i \in [n]$ with $y_i^t \geq z$, we get as in \cref{eq:phi_batched_i} (using that $\lambda \leq 1$ and that $p$ satisfies $\mathcal{C}_3$ for $C \in (1, 1.9)$),%
\begin{align*}
\Ex{\left. \Lambda_i^{t + b} \,\,\right|\,\, \mathfrak{F}^t} 
 & \leq \Lambda_i^t \cdot \left( 1 + \Big(p_i^t - \frac{1}{n}\Big) \cdot \lambda + 2 \cdot p_i^t \cdot S\lambda^2 \right)^b.
\end{align*}
Since there are at most $\delta n$ such bins (i.e., $i \leq \delta n$), $p$ satisfies condition $\mathcal{C}_1$ and the normalised vector $y^t$ is sorted, the upper bound on $\Ex{\Lambda^{t+b} \mid \mathfrak{F}^t, \Gamma_2^t \leq \tilde{c} \cdot n}$ is maximized when $p_i^t = \frac{1 - \eps}{n}$, so
\begin{align*}
\sum_{i : y_i^t \geq z} \Ex{\left. \Lambda_i^{t + b} \,\,\right|\,\, \mathfrak{F}^t, \Gamma_2^t \leq \tilde{c} \cdot n} 
 & \stackrel{(a)}{\leq} \sum_{i : y_i^t \geq z} \Lambda_i^t \cdot \left( 1 - \frac{\lambda\eps}{n} + 2CS \cdot \frac{\lambda^2}{n} \right)^b \\
 &  \stackrel{(b)}{\leq} \sum_{i : y_i^t \geq z} \Lambda_i^t \cdot \left( 1 - \frac{\lambda\eps}{2n}\right)^b  
 \stackrel{(c)}{\leq} \sum_{i : y_i^t \geq z} \Lambda_i^t \cdot e^{-\frac{\lambda\eps}{2n} \cdot b},
\end{align*}
using in $(a)$ that $p_i^t \leq \frac{C}{n}$, in $(b)$ that $\lambda = \frac{\eps}{4CS}$ and in $(c)$ that $1 + v \leq e^v$ for any $v$. For the rest of the bins with $i > \delta n$,
\begin{align*}
\Ex{\left. \Lambda_i^{t + b} \,\right|\, \mathfrak{F}^t} 
 & \leq \Lambda_i^t \cdot \left( 1 + \left(p_i^t - \frac{1}{n}\right) \cdot \lambda + 2 \cdot p_i^t \cdot S\lambda^2 \right)^b \\
 & \stackrel{(a)}{\leq} \Lambda_i^t \cdot \left( 1 + \frac{C - 1}{n} \cdot \lambda + 2CS \cdot \frac{\lambda^2}{n} \right)^b \\
 & \stackrel{(b)}{\leq} \Lambda_i^t \cdot \left( 1 + \frac{C-1}{n} \cdot \lambda\right)^b \\
 & \stackrel{(c)}{\leq} \left( 1 + 2 \cdot \frac{C-1}{n} \cdot \lambda\right)^b \\
 & \stackrel{(d)}{\leq} e^{2 \cdot \frac{C-1}{n} \cdot \lambda b} \stackrel{(e)}{\leq} n,
\end{align*}
using in $(a)$ that $p_i^t \leq \frac{C}{n}$, in $(b)$ that $2CS \cdot \frac{\lambda^2}{n} = \frac{\eps}{2n} \cdot \lambda \leq \frac{C-1}{2n} \cdot \lambda$ since $\lambda = \frac{\eps}{4CS}$ and $\eps = C - 1$, in $(c)$ that $\Lambda_i^t \leq 1$, in $(d)$ that $1 + v \leq e^v$ for any $v$ and in $(e)$ that $2 \cdot \frac{C-1}{n} \cdot \lambda b \leq \frac{\eps^2}{2CS} \cdot \frac{b}{n} \leq \log n$ (since $C > 1$ and $S \geq 1$).

Aggregating the contributions over all bins, 
\begin{align*}
\Ex{\left. \Lambda^{t+b} \,\right|\, \mathfrak{F}^t, \Gamma_2^t \leq \tilde{c} \cdot n } 
 & \leq \sum_{ i : y_i^t \geq z} \Lambda_i^t \cdot e^{-\frac{\lambda\eps}{2n} \cdot b} + \sum_{ i : y_i^t < z} n
 \leq \Lambda^t \cdot e^{-\frac{\lambda\eps}{2n} \cdot b} + n^2.\qedhere
\end{align*}
\end{proof}


Now we are ready to complete the proof of \cref{thm:batching_strong_gap_bound}.

\begin{proof}[Proof of \cref{thm:batching_strong_gap_bound}]
First consider the case when $m \geq b \cdot \log^3 n$. Let $t_0 = m - b \cdot \log^3 n$. Let $\mathcal{E}^t := \big\{ \Gamma_2^{t} \leq \tilde{c} \cdot n \big\}$. Then using \cref{lem:batching_gamma_linear_whp},
\begin{equation} \label{eq:eps_interval}
\Pro{ \bigcap_{j \in [0, \log^3 n]} \mathcal{E}^{t_0 + j \cdot b} } \geq 1 - n^{-3}.
\end{equation}
We define the killed potential $\tilde{\Lambda}$, with $\tilde{\Lambda}^{t_0} := \Lambda^{t_0}$ and for $j > 0$,
\[
\tilde{\Lambda}^{t_0 + j \cdot b} := \Lambda^{t_0 + j \cdot b} \cdot \mathbf{1}_{\cap_{s \in [0, j]} \mathcal{E}^{t_0 + s \cdot b}}.
\]
By \cref{lem:lambda_drops} for $t = t_0 + j \cdot b$, we have that
\begin{align*}
\Ex{\Lambda^{t_0+(j+1)\cdot b} \,\,\left|\,\, \mathfrak{F}^{t_0+j \cdot b}, \Gamma_2^{t_0+j \cdot b} \leq \tilde{c} \cdot n \right.} \leq \tilde{\Lambda}^{t_0 + j \cdot b} \cdot e^{-\frac{\lambda\eps}{2n} \cdot b} + n^2.
\end{align*}
When $\mathcal{E}^{t_0 + j \cdot b}$ does not hold, then deterministically $\tilde{\Lambda}^{t_0 + (j+1) \cdot b} = \tilde{\Lambda}^{t_0 + j \cdot b} = 0$. Hence, we have the following unconditional drop inequality
\begin{align} \label{eq:lambda_drop}
\Ex{\tilde{\Lambda}^{t_0+(j+1)\cdot b} \,\,\left|\,\, \mathfrak{F}^{t_0+j \cdot b} \right.} \leq \tilde{\Lambda}^{t_0 + j \cdot b} \cdot e^{-\frac{\lambda\eps}{2n} \cdot b} + n^2.
\end{align}
Assuming $\mathcal{E}^{t_0}$ holds, we have %
\[
\max_{i \in [n]} y_i^{t_0} 
  \leq \frac{1}{\alpha_2} \cdot \left( \log \tilde{c} + \log n \right) 
  \leq \frac{2}{\alpha_2} \cdot \log n,
\]
for sufficiently large $n$. Recalling that $\alpha_2 = \Theta(\lambda \cdot \log n)$, there exists a constant $\kappa_1 > 0$ such that
\[
\tilde{\Lambda}^{t_0} \leq n \cdot e^{\lambda \cdot y_1^{t_0}} \leq e^{\kappa_1 \log^2 n}.
\]
Applying \cref{lem:geometric_arithmetic} to \cref{eq:lambda_drop} with $a := e^{-\frac{\lambda\eps}{2n} \cdot b}$ and $b := n^2$ for $\log^3 n$ steps, %
\begin{align}
\Ex{\tilde{\Lambda}^{m} \,\,\left|\,\, \mathfrak{F}^{t_0}, \tilde{\Lambda}^{t_0} \leq e^{\kappa_1 \log^2 n} \right.} 
& \leq e^{\kappa_1 \log^2 n} \cdot a^{\log^3 n} + \frac{b}{1 - a}   \stackrel{(a)}{\leq} 1 + 1.5 \cdot b \leq  2n^2  \label{eq:poly_n_expectation}.
\end{align}
using in $(a)$ that $\frac{\lambda\eps}{2n} \cdot b = \Omega(\log n)$, since $\lambda = \frac{\eps}{4CS}$ and $\eps = \sqrt{(n/b) \cdot \log n}$.

By Markov's inequality, we have
\begin{align*}
\Pro{\tilde{\Lambda}^{m} \leq 2n^{5} \,\left|\, \mathfrak{F}^{t_0}, \tilde{\Lambda}^{t_0} \leq e^{\kappa_1 \log^2 n} \right.} \geq 1 - n^{-3}.
\end{align*}
Hence, by \cref{eq:eps_interval},
\begin{align} \label{eq:tilde_lambda_poly_n}
\Pro{\tilde{\Lambda}^{m} \leq 2n^{5}} 
 & = \Pro{\left. \tilde{\Lambda}^{m} \leq 2n^{5} \,\right|\, \mathcal{E}^{t_0}} \cdot \Pro{\mathcal{E}^{t_0}} 
 \geq (1 - n^{-3}) \cdot (1 - n^{-3}) \geq 1 - 2n^{-3}.
\end{align}
Combining \cref{eq:eps_interval} and \cref{eq:tilde_lambda_poly_n}, we have
\begin{align*}
\Pro{\Lambda^{m} \leq 2n^{5}} 
 &\geq \Pro{\left\lbrace\tilde{\Lambda}^{m} \leq 2n^{5} \right\rbrace \cap \bigcap_{j \in [0, \log^3 n]} \mathcal{E}^{t_0 + j \cdot b}} \geq 1 - 2n^{-3} - n^{-3} \geq 1 - n^{-2}.
\end{align*}
Finally, $\{ \Lambda^{m} \leq 2 \cdot n^{5} \}$ implies that
\[
\max_{i \in [n]} y_i^{m} \leq z + \frac{\log 2}{\lambda} + \frac{5\log n}{\lambda} = \Oh\left(\sqrt{\frac{b}{n} \cdot \log n} \right),
\]
since $\lambda = \frac{\eps}{4CS} = \Theta(\sqrt{(n \log n)/b})$, so the claim follows.

For the case when $m < b \cdot \log^3 n$, it deterministically holds that $\tilde{\Lambda}^{t_0} \leq n$, which is a stronger starting point in \cref{eq:poly_n_expectation} to prove that $\ex{\Lambda^m} \leq 2n^{5}$, which in turn implies the gap bound.
\end{proof}

\section{Lower Bounds on the Gap} \label{sec:lower_bounds}

In this section, we prove two lower bounds of $\Omega( \sqrt{ (b/n) \cdot \log n})$ on the gap. Both lower bounds even hold in the unweighted setting.
\begin{obs}\label{obs:simple_lower}
Consider the \Batched setting with any $b \geq n \log n$, and assume all balls are unweighted. Then, for any process which uses the same probability vector within each batch with random tie breaking, we have that
\[
\Pro{\Gap(b) \geq \frac{1}{10} \cdot \sqrt{\frac{b}{n} \cdot \log n}} \geq 1 - n^{-2}.
\]
\end{obs}
\begin{proof}
In the first batch, any such process behaves exactly like \OneChoice. Hence the result follows immediately from a known lower bound for \OneChoice for $b$ balls into $n$ bins (cf.~\cite{RS98} and~\cite[Lemma A.2]{LS23RBB}).
\end{proof}


The next lower bound is more involved. This bound even applies to processes which are allowed to adjust the probability allocation vector from one batch to another arbitrarily;  e.g., the probability for a heavily underloaded bin might be set close to (or even equal to) $1$, and similarly, the probability for a heavily overloaded bin might be set close to (or equal to) $0$. Additionally, the lower bound below applies to any two consecutive batches, and not only to the end of the first batch as in \cref{obs:simple_lower}.

\begin{thm}\label{thm:lower}
Consider the \Batched setting with any $b = \Omega(n \log n)$, and assume all balls are unweighted. Furthermore, consider an allocation process which may adaptively change the probability vector for each batch. Then there is a constant $\kappa > 0$ such that for any allocation process (which may adaptively change the probability for each batch) it holds that for every $t \geq 0$ being a multiple of $b$,
\[
 \Pro{ \max \left\{ \Gap(t), \Gap(t+b) \right\} \geq \kappa \cdot \sqrt{ \frac{b}{n} \cdot \log n}  } \geq 1/2. 
\]
\end{thm}
\begin{proof}
In the proof, we shall prove a slightly stronger statement:
\[
 \Pro{ \max \left\{ \Gap(t), \Gap(t+b) \right\} \geq \kappa \cdot \sqrt{ \frac{b}{n} \cdot \log n} ~~\Bigg|~~ \mathfrak{F}^t } \geq 1/2. 
\]
That is, there is no load configuration and no probability allocation vector (depending on $\mathfrak{F}^t$) such that the gap is small, both before and at the end of an arbitrary batch. 

For notational convenience, we will prove this statement by assuming that $t=0$, and $x^{0}$ is an arbitrary load vector satisfying $\sum_{i \in [n]} x_i^0=0$ (in other words, we shift time backwards by $t$ steps) and $p = p^0$ is the probability allocation vector used by the process.
Consider one arbitrary bin $j \in [n]$. Then,
\begin{align}
 \Ex{ x_j^b - \frac{b}{n} + z_j} = b \cdot p_j - \frac{b}{n} + z_j =: \varphi_j \label{eq:optimised}
\end{align}

For a sufficiently large constant $C > 0$,  let us now assume $\max_{j \in [n]} z_j \leq C/2 \cdot \sqrt{(b/n) \cdot \log n}$; clearly, if this is not the case, we already have a large gap already before the next batch. 

Next consider a bin $j \in [n]$ with
\[
 p_j \leq \frac{1}{n} + \frac{1}{b} \cdot \left( -10 C \cdot \sqrt{(b/n) \cdot \log n} \right).
\]
We will now apply a Chernoff bound (\cref{lem:chernoff}) for $x_j^b \sim \mathsf{Bin}(b,p_j)$,
with $\delta := C \cdot \sqrt{(n/b) \cdot \log n}$, $\mu := b \cdot p_j$ and $\mu_H := \frac{b}{n} - 10 C \cdot \sqrt{(b/n) \cdot \log n} \geq \mu$ to get that
\[
  \Pro{ x_j^b \geq \frac{b}{n} - C \cdot \sqrt{(b/n) \cdot \log n}}
    \leq \Pro{ x_j^b \geq \mu_H \cdot (1 + \delta) }
    \leq e^{-\delta^2\mu_H/3} = e^{-\frac{C^2}{3} \cdot \log n} \leq n^{-4}.
\]
using that $C \geq 4$.
If $\left\{ x_j^b \leq \frac{b}{n} - C \cdot \sqrt{(b/n) \cdot \log n} \right\}$ occurs, then
\[
x_j^b - \frac{b}{n} + z_j \leq -C \cdot \sqrt{(b/n) \cdot \log n} + z_j  \leq -C/ 2 \cdot \sqrt{(b/n) \cdot \log n} \leq 0,
\]
and thus bin $j$ will not contribute to the gap at step $b$.

Hence in the remainder of the proof, we would like to assume that for all bins $j \in [n]$,
\[
 p_j \geq \frac{1}{n} + \frac{1}{b} \cdot \left( -10 C \cdot \sqrt{(b/n) \cdot \log n} \right) =: p_{\operatorname{low}}
\]
Note $p_{\operatorname{low}} \leq 1/n$.
Consider now a transformation of the probability vector $(p_i)_{i \in [n]}$ into $(\tilde{p}_i)_{i \in [n]}$, where $\tilde{p}$ satisfies for all $j \in [n]$,
\[
p_{\operatorname{low}} \leq \tilde{p}_j \leq \max \left\{  p_{\operatorname{low}},  p_j \right\}.
\]
In other words, in $\tilde{p}$ we only increase probabilities of bins $j \in [n]$, for which $p_j < p_{\operatorname{low}}$. Let us define $\mathcal{J} := \left\{ j \in [n] \colon p_j < p_{\operatorname{low}} \right\}$. For $b \geq (20C^2) n \log n$, this implies $\tilde{p}_j \geq \frac{1}{2n}$ for all $j \in [n]$.


Further,
let $(x_i^b)_{i \in [n]}$ be a load vector where the locations of the next $b$ balls are sampled according to $p$, and $(\tilde{x}_i^b)_{i \in [n]}$ be a load vector where these locations are sampled according to $\tilde{p}$. Clearly, there is a coupling so that for every $j \in [n] \setminus \mathcal{J}$, $x_j^b \geq \tilde{x}_j^b$ (since $p_j^b \geq \tilde{p}_j^b$). Further, for any $j \in \mathcal{J}$, by a union bound,
\[
 \Pro{ \max_{j \in \mathcal{J}} x_j^b > 0 } \leq | \mathcal{J} | \cdot n^{-4} \leq n^{-3}.
\]
Hence it follows that, for any threshold $T > 0$,
\begin{align*}
 \Pro{ \max_{j \in [n]} x_j^b \geq T } &\geq
 \Pro{ \max_{j \in [n]} \tilde{x}_j^b \geq T} - \Pro{ \max_{j \in \mathcal{J}} x_j^b \geq T}
  \geq  \Pro{ \max_{j \in [n]} \tilde{x}_j^b \geq T} - n^{-3}.
 \end{align*}

Therefore, in the remainder of the proof, we will lower bound $\Pro{ \max_{j \in [n] \setminus \mathcal{J}} \tilde{x}_j^b \geq T}$ for a suitable value of $T=\Omega( \sqrt{b / n \cdot \log n})$. We will also use the definition
\[
 \tilde{\varphi}_j := b \cdot \tilde{p}_j - \frac{b}{n} + z_j.
\]
Finally, we define $\gamma=0.1$ as a (sufficiently) small constant.


\textbf{Case 1:} We have at least $n-n^{\gamma}$ bins for which $\tilde{\varphi}_j \leq - C \sqrt{b/n}$. Since $\sum_{i \in [n]} \tilde{\varphi}_i = 0$, this implies that there must be at least one bin with $j \in [n]$ with
$
 \tilde{\varphi}_j \geq \frac{n-n^{\gamma}}{n^{\gamma}} \cdot C \cdot \sqrt{b/n} \geq 1/2 \cdot n^{1-\gamma} \cdot \sqrt{b/n}.
$
Further, using that the median of a $\mathsf{Bin}(N, q)$ r.v.~is either $\lfloor Nq \rfloor$ or $\lceil Nq \rceil$, then
\[
\Pro{ x_j^b \geq \Ex{ x_j^b } - \frac{1}{4} n^{1-\gamma} } \geq \Pro{ x_j^b \geq \left\lfloor \Ex{ x_j^b }\right\rfloor }
\geq 1/2, \] it follows that with probability at least $1/2$ we will have a large gap.

\textbf{Case 2:} We have at least $n^{\gamma}$ bins with $\tilde{\varphi}_j \geq - C \sqrt{b/n}$; call this set $\mathcal{B}$. We further know that, due to the definition of $\tilde{p}$, we have for all bins $j \in [n]$ that $\tilde{p}_j \geq \frac{1}{2n}$. Hence, we set $T:= b \cdot \tilde{p}_j + \kappa \cdot \sqrt{ b \cdot \tilde{p}_j \cdot \log n}$, and applying \cref{lem:binomial} yields for any bin $j \in \mathcal{B}$,
$
 \Pro{ \tilde{x}_j^{b} \geq T } \geq n^{-\gamma/2}.
$
Since $\tilde{p}_j \geq \frac{1}{2n}$, $|\mathcal{B}| \geq n^{\gamma}$, the claim follows.\end{proof}





















\section{Experiments} \label{sec:experiments}

In this section, we complement our theoretical analysis with some empirical results for the unweighted setting. In \cref{fig:beta_vs_batch_size}, we plot the gap of the \OnePlusBeta process for various batch sizes and different values of $\beta \in (0, 1]$ (where $\beta = 1$ corresponds to \TwoChoice). The plot strongly suggests the existence of a sweet spot for the optimal choice of $\beta$, which increases in the batch size $b$.

\begin{figure}[H]
    \centering
    \vspace{-0.2cm}
    \includegraphics[scale=1]{figs/one_plus_beta_vs_batch.pdf}
    \vspace{-0.4cm}
    \caption{Empirical results for the unweighted \Batched setting for the \OnePlusBeta process for $n = 1.000$ bins and various batch sizes (averaged over $25$ runs).}
    \label{fig:beta_vs_batch_size}
\end{figure}

In \cref{fig:eta_vs_batch_size}, we present the corresponding results of \cref{fig:beta_vs_batch_size} for  
the \Quantile process (mixed with \OneChoice). As with the \OnePlusBeta process, the larger the batch size, the smaller the optimal mixing factor $\eta$. The \Quantile with the optimized mixing factor seems to perform slightly worse than the optimized \OnePlusBeta process.

\begin{figure}[H]
    \centering
    \vspace{-0.2cm}
    \includegraphics[scale=1]{figs/quantile_vs_batch.pdf}
    \vspace{-0.4cm}
    \caption{Empirical gap for the unweighted \Batched setting for the \Quantile process (mixed with \OneChoice with probability $\eta \in (0, 1]$) for $n = 1.000$ bins and various batch sizes (averaged over $25$ runs).}
    \label{fig:eta_vs_batch_size}
\end{figure}

 In \cref{fig:process_comparison}, we plot the gap of \TwoChoice, \ThreeChoice and \OnePlusBeta versus the batch size. As expected, for small values of $b$, the gap of \TwoChoice and \ThreeChoice is small, but soon grows rapidly, diverging from the asymptotically optimal \OnePlusBeta processes.

\begin{figure}[H]
    \centering
    \includegraphics[scale=1]{figs/process_comparisons.pdf}
    \vspace{-0.4cm}
    \caption{Empirical gap for the unweighted \Batched setting for the gap of the \ThreeChoice, \TwoChoice and \OnePlusBeta with $\beta = 0.5$, $\beta = 0.7 \cdot \sqrt{(n\log n)/b}$ and $\beta = \sqrt{(n\log n)/b}$ for $n = 1.000$ bins vs batch size (averaged over $100$ runs).}
    \label{fig:process_comparison}
\end{figure}

Finally, in \cref{tab:batch_large_values}, we show the performance of the \OnePlusBeta and \Quantile compared to \TwoChoice and \OneChoice with $b$ balls (which is the theoretically optimal attainable value), for $n \in \{ 10^4, 10^5 \}$. The for large $b$, the \OnePlusBeta has roughly half the gap of \TwoChoice and is close to the optimal value of \OneChoice.

\begin{table}[H]
    \centering
    \begin{tabular}{cc|c|c|c|c|}
\cline{3-6}
 &  & \TwoChoice & \Quantile & \OnePlusBeta & \OneChoice for $m = b$ \\ \hline
\multirow{3}{*}{\rotatebox{90}{$n=10^4$}} 
& \multicolumn{1}{|c|}{$b = 20n$} & 36.45 & 30.15 & 26.60 & 19.00 \\ \cline{2-6}
& \multicolumn{1}{|c|}{$b = 50n$} &  70.10 & 45.75 & 39.00 & 29.75 \\ \cline{2-6}
& \multicolumn{1}{|c|}{$b = 80n$} & 100.85 & 55.65 & 46.80 & 35.80 \\ \hline
\multirow{3}{*}{\rotatebox{90}{$n=10^5$}} 
& \multicolumn{1}{|c|}{$b = 20n$} &  39.90 & 34.1 & 29.95 & 22.40 \\ \cline{2-6}
& \multicolumn{1}{|c|}{$b = 50n$} &  75.55 & 50.3 & 44.20 & 34.30 \\ \cline{2-6}
& \multicolumn{1}{|c|}{$b = 80n$} & 111.10 & 64.9 & 55.20 & 41.95 \\ \hline
    \end{tabular}
    \caption{Empirical gap for \TwoChoice, \Quantile (with $\eta = \sqrt{(n \log n)/b}$) and \OnePlusBeta (with $\beta = 0.7 \sqrt{(n \log n)/b}$) for the \Batched setting with $b \in \{ 20n, 50n, 80n \}$ and $n \in \{ 10^3, 10^4 \}$ (averaged over $20$ runs).}
    \label{tab:batch_large_values}
\end{table}


\section{Conclusions} \label{sec:conclusions}


In this work, we revisited the outdated information setting of \cite{BCEFN12}, where  balls are allocated to bins in batches of sizes $b$, and only then the load information is updated. We established that by defining the mixing factor $\beta$ carefully as a function of the batch size $b$, $(1+\beta)$ achieves the asymptotically optimal gap for any $b \leq n \log n$. That is, by having $\beta$ chosen not too large, $(1+\beta)$ circumvents the ``herd behavior'' (as called in \cite{M00}), where some of the previously underloaded bins are chosen too frequently, turning them into heavily overloaded bins in the next batch. Similarly, $\beta$ should also not be too small, as otherwise the pocess would be too close to \OneChoice.  

There are several directions for future work. First, recall that our lower bounds apply to a large class of processes which allocate all balls within the \emph{same batch} independently. However, there are processes  which allocate multiple balls in a coordinated way. For example, the process of Park~\cite{P11} draws $d$ samples, and then places into each of the $k$ least loaded bins one ball. It would be interesting to explore the gap of this type of processes in the batched setting. A second avenue is to analyze \TwoThinning processes (and in particular processes that use a fixed load threshold relative to the average) in outdated information settings. An experimental study of threshold processes with outdated information was already conducted in 1989~\cite[Figure 8]{MTS89}, but no rigorous bounds were proven. Finally, one could study models where the load information of bins is updated at different rates, depending on the specific bin. In such a setting, when deciding between sampled bins, both their reported load estimates and update rates should be taken into account.


\clearpage

\bibliographystyle{ACM-Reference-Format-CAM}
