\section{Derivation of Lagrange Multiplier Method for Adversarial Step}
\label{sec:dual-details}
The problem we have in the adversarial step is 
\begin{align*}
&  \inf_{\bm{r} }  \sum_{i=1}^n r_i \cdot l_i \\
 s.t. &  \\
& \frac{1}{n} \sum_{i=1}^n r_i \log r_i \le \delta,  \quad \frac{1}{n} \sum_{i=1}^n r_i = 1, \quad r_i \ge 0,  
\end{align*}
where $l_i \equiv \log P_{\theta} (\bm{z_i})$ can be treated as a constant because the parameter $\theta$ is fixed. In addition, we also ignore the constant \nicefrac{1}{n} in the objective function because it doesn't change the optimal  weight values $\bm{r}$.

We begin by constructing the Lagrangian,
$$
L(\bm{r}, \alpha, \beta) =  \sum_{i} r_i l_i + \alpha \left ( \sum_i r_i \log r_i - n\delta  \right ) + \beta \left(  \sum_i r_i - n \right ),
$$
where $\alpha>0$ and $\beta$ are Lagrange multipliers. Note that we omit the third constraint $r_i \ge 0$ in the above formulation because $r_i \log r_i$ already implies the constraint, and we will also show it is safe to drop it in the later part of the derivation.
Take the derivative~\footnote{The $\log$ function is of base $e$, i.e., the nature log.} of the Lagrange respect to $r_i$ and set the derivative to zero, we have 
$$
\frac{\partial L}{\partial r_i} = l_i + \alpha (\log r_i + 1) + \beta = 0,
$$
and this gives us 
$$
r_i = \exp \left ( \frac{-\beta - l_i}{\alpha} - 1 \right ),
$$
which is always greater than $0$ and this is also why we can safely drop the third constraint $r_i \ge 0$. Plugging the above equation back into the Lagrangian, we get our dual objective function as 
$$
L'(\alpha,\beta) = - \sum_i \alpha \cdot \exp \left ( \frac{-\beta - l_i}{\alpha} - 1 \right )  - \alpha n \delta - \beta n.
$$
Taking the derivative respect to $\alpha$ and $\beta$ and setting them to zero, we have 
\begin{equation}
\begin{aligned}
\frac{\partial L'}{\partial \alpha} = & -n\delta - \sum_i \exp \left ( \frac{-\beta - l_i}{\alpha} - 1 \right ) + \alpha \exp \left ( \frac{-\beta - l_i}{\alpha} - 1 \right ) \frac{\beta + l_i}{\alpha^2} \\
= & -n\delta - \sum_i \exp \left ( \frac{-\beta - l_i}{\alpha} - 1 \right ) \left ( \frac{\beta + l_i}{\alpha} + 1 \right ) = 0
\end{aligned}    
\end{equation}
and
\begin{equation}
\begin{aligned}
\frac{\partial L'}{\partial \beta} = & -n - \sum_i \alpha \cdot \exp \left ( \frac{-\beta - l_i}{\alpha} - 1 \right ) \cdot -\frac{1}{\alpha} \\ 
= & -n + \sum_i \exp \left ( \frac{-\beta - l_i}{\alpha} - 1 \right ) = 0.
\end{aligned}
\end{equation}

For detailed analysis and efficient algorithm, please refer to section~\ref{sec:adv-step}. 



\section{Proof of Strict Concaveness}
\label{sec:strict-concav}
In order to prove the dual objective function $L'$ is strictly concave, we need to show that the Hessian matrix (a $2 \times 2$ matrix in our case) is always negative-definite. To begin with, we first compute the Hessian matrix as follows. 


\begin{equation*}
\begin{aligned}
A = \frac{\partial^2 L'}{\partial^2 \alpha} = &   \sum_i \exp \left ( \frac{-\beta - l_i}{\alpha} - 1 \right ) \left ( \frac{-\beta - l_i}{\alpha} - 1 \right ) \left ( \frac{\beta + l_i}{\alpha^2}  \right ) +  \exp \left ( \frac{-\beta - l_i}{\alpha} - 1 \right ) \left ( \frac{\beta + l_i}{\alpha^2}  \right ) \\
= & \sum_i \exp \left ( \frac{-\beta - l_i}{\alpha} - 1 \right ) \left ( \frac{\beta + l_i}{\alpha^2}  \right ) * \left ( \frac{-\beta - l_i}{\alpha} -1 +1\right ) \\
= - & \sum_i r_i \frac{(\beta + l_i)^2}{\alpha^3} 
\end{aligned}    
\end{equation*}

Here, we are using the fact that $r_i = \exp \left ( \frac{-\beta - l_i}{\alpha} - 1 \right )$. The other two derivatives are shown as follows.

\begin{equation*}
\begin{aligned}
C = \frac{\partial^2 L'}{\partial^2 \beta} = &   \sum_i \exp \left ( \frac{-\beta - l_i}{\alpha} - 1 \right ) * - \frac{1}{\alpha} \\
= &  - \frac{1}{\alpha} \sum_i r_i
\end{aligned}    
\end{equation*}

and 

\begin{equation*}
\begin{aligned}
B = \frac{\partial^2 L'}{\partial \alpha \partial  \beta} = \frac{\partial^2 L'}{\partial  \beta \partial \alpha} =  &  \sum_i \exp \left ( \frac{-\beta - l_i}{\alpha} - 1 \right ) * \left( \frac{\beta +l_i}{\alpha ^ 2} \right) \\
= &   \sum_i r_i \left( \frac{\beta +l_i}{\alpha ^ 2} \right)
\end{aligned}    
\end{equation*}

Therefore, the Hessian matrix is 
$$
M = 
\begin{bmatrix}
    A & B \\
    B & C
\end{bmatrix}.
$$
To prove the above matrix $M$ is negative definite, we need to show the following two facts: 
\begin{enumerate}
    \item the trace $\text{trace}(M) = \lambda_1 + \lambda_2 < 0$, where $\lambda_1$ and $\lambda_2$  are the eigenvalues. \\
    \textbf{Proof:} the trace of matrix M is 
    $$
    A+C = -  \sum_i r_i \frac{(\beta + l_i)^2}{\alpha^3} - \frac{1}{\alpha} \sum_i r_i.
    $$
    Because $r_i = \exp \left ( \frac{-\beta - l_i}{\alpha} - 1 \right ) > 0$ and $\alpha \ge 0$, the above formulate is always negative. Note that when $\alpha = 0$, the dual objective function is a constant and there is no need for optimization. 
    \item the determinant $\det(M) = \lambda_1 \cdot \lambda_2 > 0$. \\
    \textbf{Proof:} the determinant of matrix M is 
    \begin{equation*}
    \begin{aligned}
    A\cdot C - B\cdot B = & \left(  \sum_i r_i \frac{(\beta + l_i)^2}{\alpha^3}  \right) \left(  \frac{1}{\alpha} \sum_i r_i \right) - \left ( \sum_i r_i \left( \frac{\beta +l_i}{\alpha ^ 2} \right) \right ) \cdot \left ( \sum_i r_i \left( \frac{\beta +l_i}{\alpha ^ 2} \right) \right ) \\
    = & \frac{1}{\alpha^4} \left ( \left(  \sum_i r_i (\beta + l_i)^2  \right) \cdot \left( \sum_i r_i \right) - \left ( \sum_i r_i \left( \beta +l_i \right) \right ) \cdot \left ( \sum_i r_i \left( \beta +l_i\right) \right )  \right).
    \end{aligned}
    \end{equation*}
    Denote $$X = \left(  \sum_i r_i (\beta + l_i)^2  \right) \cdot \left( \sum_i r_i \right)$$ and $$Y = \left ( \sum_i r_i \left( \beta +l_i \right) \right ) \cdot \left ( \sum_i r_i \left( \beta +l_i\right) \right ), $$ to further simplify the above equation, let's focus on the coefficients of item $r_i r_j, \forall i \le j$ from $X$ and $Y$, respectively.
    Specifically, we have the coefficient of $r_i r_j$ in $X$ as 
    $$
    (\beta + l_i) ^ 2 + (\beta + l_j) ^ 2,
    $$ and the coefficient of $r_i r_j$ in $Y$ is 
    $$
    2 \cdot (\beta + l_i) \cdot (\beta + l_j).
    $$ And we have the difference between them as 
    $$
    (\beta + l_i) ^ 2 + (\beta + l_j) ^ 2 - 2 \cdot (\beta + l_i) \cdot (\beta + l_j) = ( (\beta + l_i) - (\beta + l_j)) ^2 = (l_i - l_j) ^2.
    $$
    Therefore, we can simplify the determinant as 
    $$
    A\cdot C - B\cdot B = \sum_i \sum_{j \ge i} \frac{(l_i -l_j)^2}{\alpha^4} r_i r_j \ge 0.
    $$ It is easy to see that the determinant always greater than zero and  is equal to zero only when the log-likelihoods of all training instances are equal, which is unlikely given that $l_i$ is real-valued and there are usually many training instances.

\end{enumerate}


From above proof, we can conclude that, except the case that all log-likelihoods are equal (almost impossible in practice), both of the eigenvalue $\lambda_1$ and $\lambda_2$ are strictly negative, which means the matrix $M$ is negative definite and the objective function is strictly concave.





