

Consider a generative model $\mathcal{G}$ that generates samples from a probability distribution $P_X$. To conduct a reference-free evaluation of the model, we suppose the evaluator has access to $n$ independently generated samples from $P_X$, denoted by $x_1,\ldots ,x_n\in\mathcal{X}$. %We use $\mathcal{X}$ to denote the support set of the generative model. 
The assessment task is to estimate the diversity of generative model $\mathcal{G}$ by measuring the variety of the observed generated data, $x_1,\ldots x_n$. In the following subsections, we will discuss kernel functions and their application to define the Vendi and RKE diversity scores.

\subsection{Kernel Functions and Matrices}
Following the standard definition, $k:\mathcal{X}\times \mathcal{X} \rightarrow \mathbb{R}$ is called a kernel function if for every integer $n\in\mathbb{N}$ and inputs $x_1, \ldots, x_n \in \mathcal{X}$, the following kernel similarity matrix $K\in\mathbb{R}^{n\times n}$ %$K = \bigl[k(x_i,x_j) \bigr]_{1\le i,j\le n}$ 
is positive semi-definite (PSD):
\begin{equation}
    K = \begin{bmatrix} k(x_1,x_1) & \cdots & k(x_1,x_n) \\ \vdots & \ddots & \vdots \\ k(x_n,x_1) & \cdots & k(x_n,x_n)
    \end{bmatrix}
\end{equation}
Aronszajn’s Theorem \citep{aronszajn1950reproducing} shows that this definition is equivalent to the existence of a feature map $\phi :\mathcal{X} \rightarrow \mathbb{R}^d$ 
such that for every $x, x' \in\mathcal{X}$ we have the following where $\langle \cdot ,\cdot \rangle $ denotes the standard inner product in the $\mathbb{R}^d$ space:
\begin{equation}\label{Eq: Kernel Equivalent Definition}
    k(x,x') \, =\,  \bigl\langle \phi(x), \phi(x') \bigr\rangle 
\end{equation}
In this work, we study the evaluation using two types of kernel functions: 1) finite-dimension kernels where dimension $d$ is finite, 2) infinite-dimension kernels where there is no feature map satisfying \eqref{Eq: Kernel Equivalent Definition} with a finite $d$ value. A standard example of a finite-dimension kernel is the cosine similarity function where $\phi_{\text{cosine}}(x)= x/ \Vert x\Vert_2$. Also, a widely-used infinite-dimension kernel is the Gaussian (RBF) kernel with bandwidth parameter $\sigma >0$ defined as
\begin{equation}
    k_{\text{\rm Gaussian}(\sigma)} \bigl(x , x'\bigr) \, :=\, \exp\Bigl(-\frac{\bigl\Vert x - x'\bigr\Vert^2_2}{2\sigma^2}\Bigr)
\end{equation}
Both the mentioned kernel examples belong to normalized kernels which require $k(x, x)=1$ for every $x$, i.e., the feature map $\phi(x)$ has unit Euclidean norm for every $x$. Given a normalized kernel function, the non-negative eigenvalues of the normalized kernel matrix $\frac{1}{n}K$ for $n$ points $x_1,\ldots x_n$ will sum up to $1$, i.e., they form a probability model.

\subsection{Matrix-based Entropy Functions and Vendi Score}
For a PSD matrix $A \in\mathbb{R}^{d\times d}$ with unit trace $\mathrm{Tr}(A)=1$, $A$'s eigenvalues form a probability model. The order-$\alpha$ Renyi entropy of matrix $A$ is defined using the order-$\alpha$ entropy of its eigenvalues as 
\begin{equation}\label{Eq: order-alpha entropy}
    H_{\alpha}(A) \, :=\, \frac{1}{1-\alpha}\log\Bigl(\sum_{i=1}^d \lambda^\alpha_i\Bigr)
\end{equation}
For the special case $\alpha=2$, one can consider the Frobenius norm $\Vert\cdot \Vert_F$ and apply the identity $\bigl\Vert A\bigr\Vert_F^2 = \sum_{i=1}^d \lambda^2_i$ to show $H_{2}(A)=\log\bigl(1/\bigl\Vert A\bigr\Vert_F^2\bigr)$. Moreover, for $\alpha =1$, the above definition reduces to the Shannon entropy of the eigenvalues as
$    H_{1}(A) \, := \, \sum_{i=1}^d \lambda_i \log ({1}/{\lambda_i})$ \citep{renyi1961measures}. 

\cite{jalali_information-theoretic_2023} applies the above definition for order $\alpha=2$ to the normalized kernel similarity matrix $\frac{1}{n}K$ to define the RKE diversity score (called RKE mode count), which reduces to
\begin{equation}
   \mathrm{RKE}(x_1,\ldots , x_n) \, :=\, \exp\Bigl(H_2\bigl(\frac{1}{n}K\bigr)\Bigr) =\,  \Bigl\Vert \frac{1}{n}K\Bigr\Vert^{-2}_F 
\end{equation}

For a general entropy order $\alpha$,
\citep{friedman_vendi_2023,pasarkar2023cousins} apply the matrix-based entropy definition to the normalized kernel matrix $\frac{1}{n}K$ and define the order-$\alpha$ Vendi score for samples $x_1,\ldots , x_n$ as
\begin{equation}
    \mathrm{Vendi}_\alpha \bigl(x_1,\ldots ,x_n\bigr) \, :=\, \exp\Bigl(H_\alpha\bigl(\frac{1}{n} K \bigr)\Bigr)
\end{equation}
Specifically, for order $\alpha=1$, the above definition results in the standard (order-1) Vendi score in Equation~\eqref{Eq: Intro-Vendi Score}.



\subsection{Statistical Analysis of Vendi Score}
To derive the population limits of Vendi and RKE scores under infinite sampling, which we call \emph{population Vendi} and \emph{population RKE}, respectively, we review the following discussion from \citep{bach_information_2022,jalali_information-theoretic_2023}. First, note that the normalized kernel matrix $\frac{1}{n}K$, whose eigenvalues are used in the definition of Vendi score, can be written as:
\begin{equation}
    \frac{1}{n}K = \frac{1}{n}\Phi \Phi^\top
\end{equation}
where $\Phi\in\mathbb{R}^{n\times d}$ is an $n\times d$ matrix whose rows are the feature presentations of samples, i.e., $\phi(x_1),\ldots ,\phi(x_n)$. Therefore, the normalized kernel matrix $\frac{1}{n}K$ shares the same non-zero eigenvalues with $\frac{1}{n} \Phi^\top \Phi$, where the multiplication order is flipped. Note that $\frac{1}{n} \Phi^\top \Phi$ is equal to the empirical kernel covariance matrix $ \widehat{C}_X$ defined as:
\begin{equation*}
    \widehat{C}_X := \frac{1}{n}\sum_{i=1}^n \phi(x_i)\phi(x_i)^\top  = \frac{1}{n} \Phi^\top \Phi.
\end{equation*}
As a result, the empirical covariance matrix $\widehat{C}_X=\frac{1}{n} \Phi^\top \Phi$ and kernel matrix $\frac{1}{n}K=\frac{1}{n} \Phi \Phi^\top$ share the same non-zero eigenvalues and therefore have the same matrix-based entropy value for every order $\alpha$: $
    H_{\alpha}(\frac{1}{n}K) = H_{\alpha}(\widehat{C}_X)
$. 
Therefore, if we consider the population kernel covariance matrix $\widetilde{C}_X = \mathbb{E}_{x\sim P_X}\bigl[\phi(x)\phi(x)^\top\bigr]$, we can define the population Vendi score as follows.
\begin{definition}
    Given data distribution $P_X$, we define the order-$\alpha$ population Vendi, ${\mathrm{Vendi}}_\alpha(P_X)$, using the matrix-based entropy of the population kernel covariance matrix $\widetilde{C}_X = \mathbb{E}_{x\sim P_X}\bigl[\phi(x)\phi(x)^\top\bigr]$ as
    \begin{equation}
        {\mathrm{Vendi}}_\alpha(P_X) \, :=\, \exp\Bigl( H_{\alpha}(\widetilde{C}_X)\Bigr)
    \end{equation}
\end{definition}
Note that the population RKE score is identical to the population $\mathrm{Vendi}_2$, since RKE and $\mathrm{Vendi}_2$ are the same. %In the next sections, we study the complexity of estimating the above population Vendi from a limited number of samples. 