\begin{figure*}[h]
    \centering
    \includegraphics[width=0.89\linewidth]{figures_final/kernel_convergence/imagenet_convergence.pdf}
    \caption{Statistical convergence of Vendi and RKE scores for different sample sizes on ImageNet data: (Left plots) finite-dimension cosine similarity kernel (Right plots) infinite dimension Gaussian kernel with bandwidth $\sigma=30$.  %DINOv2($d=768$) is used as the backbone embedding. 
    The RKE and truncated Vendi scores converged with below 20000 samples, but the Vendi score with Gaussian kernel did not converge.}
    \vspace{-5mm}
  \label{fig:kernel_convergence}
\end{figure*}


The increasing use of generative artificial intelligence has underscored the need for accurate evaluation of generative models.  In practice, users often have access to multiple generative models trained with different training datasets and algorithms, requiring evaluation methods to identify the most suitable model. The feasibility of a model evaluation approach depends on factors such as the required generated sample size, computational cost, and the availability of reference data. Recent studies on evaluating generative models have introduced assessment methods that relax the requirements on data and computational resources. 

Specifically, to enable the evaluation of generative models without reference data, the recent literature has focused on reference-free evaluation scores that remain applicable in the absence of reference samples. The Vendi score \citep{friedman_vendi_2023} is one such reference-free metric that quantifies the diversity of generated data using the entropy of a kernel similarity matrix formulated for the generated samples. Given the  sorted eigenvalues $\lambda_1\ge\cdots \ge\lambda_n$ of the normalized matrix $\frac{1}{n}K$\footnote{In general, we consider the trace-normalized kernel matrix $\frac{1}{\mathrm{Tr}(K)}K$, which given $\forall x: \: k(x,x)=1$, reduces to $\frac{1}{n}K$.}  for the kernel similarity matrix $K=\bigl[k(x_i,x_j)\bigr]_{1\le i,j\le n}$ of $n$ generated samples $x_1,\ldots , x_n$, the definition of (order-1) Vendi score is as: 
\begin{equation}\label{Eq: Intro-Vendi Score}
    \mathrm{Vendi}(x_1,\ldots ,x_n) \, := \, \exp\Bigl(\,\sum_{i=1}^n \lambda_i \log\frac{1}{\lambda_i}\,\Bigr)
\end{equation}
Following conventional definitions in information theory, the Vendi score corresponds to the exponential of the \emph{Von Neumann entropy} of normalized kernel matrix $\frac{1}{n}K$. More generally, \cite{jalali_information-theoretic_2023} define the Rényi Kernel Entropy (RKE) score by applying order-2 Rényi entropy to this matrix, which reduces to the inverse-squared Frobenius norm of the normalized kernel matrix:
\begin{equation}\label{Eq: Intro-Vendi Score}
    \mathrm{RKE}(x_1,\ldots ,x_n) \, := \frac{1}{\Bigl\Vert \frac{1}{n}K \Bigr\Vert^2_F}
\end{equation}
%As demonstrated in \citep{jalali_information-theoretic_2023,ospanov_fkea_2024}, the diversity evaluation of the Vendi and RKE scores can be interpreted as an unsupervised identification of clusters within the generated data, followed by the calculation of the entropy of the detected cluster variable. Due to their flexibility and adaptability, these entropy-based scores can be applied to measure the diversity of samples across different domains, including image, text, and video data.


Although the Vendi and RKE scores do not require reference samples, their computational cost increases rapidly with the number of generated samples $n$. Specifically, calculating the Vendi score for the  $n \times n$  kernel matrix $K$ generally involves an eigendecomposition of $K$, requiring  $O(n^3)$  computations. Therefore, the computational load of Vendi score becomes substantial for a large sample size $n$, and the Vendi score is typically evaluated for sample sizes limited to 20,000. In other words, the  Vendi score, as defined in Equation~\eqref{Eq: Intro-Vendi Score}, would be \emph{computationally infeasible} to compute with standard processors for sample sizes greater than a few tens of thousands. 

Following the above discussion, a key question that arises is whether the Vendi score estimated from restricted sample sizes (i.e. $n\le 20000$) has converged to its asymptotic value with infinite samples, which we call the \emph{population Vendi}. However, the statistical convergence of the Vendi score has not been thoroughly investigated in the literature. % for models trained on large-scale datasets, e.g. ImageNet \citep{deng2009imagenet} and MS~COCO \citep{fleet_microsoft_2014}, which contain many sample categories and could require a large sample size for diversity evaluation. 
In this work, we study the statistical convergence of the Vendi and RKE diversity scores and aim to analyze the concentration of the estimated scores from a limited number of generated samples $n\lessapprox 20000$. %We emphasize that the restricted sample sizes $n\lessapprox 20000$ is an inevitable constraint due to the unaffordable computational costs of computing the Vendi score for larger sample sets. 

%First, we numerically analyze the convergence of the empirically evaluated score $\mathrm{Vendi}(x_1,\ldots ,x_n)$  estimated from $n$ samples $x_1,\ldots ,x_n$ to the population Vendi, which, based on the discussion in \cite{bach_information_2022}, can be defined using the matrix-based entropy of the underlying kernel-based covariance matrix \footnote{As implied by \citep{bach_information_2022}'s results, the Vendi score asymptotically converges to the kernel covariance-based population Vendi  $n\rightarrow \infty$}. Then, a key question is whether the Vendi score for the bounded sample size $n=O(10^4)$ has converged to the population Vendi, i.e. the limit value as $n\rightarrow \infty$.
\vspace{-2mm}
\subsection{Our Results on Vendi's Convergence}\vspace{-2mm}
We discuss the answer to the Vendi convergence question for two types of kernel functions: 1) kernel functions with a finite feature dimension, e.g. the cosine similarity and polynomial kernels, 2) kernel functions with an infinite feature map such as Gaussian (RBF) kernels. For kernel functions with a finite feature dimension $d$, we theoretically and numerically show that a sample size $n=O(d)$ is sufficient to guarantee convergence to the population Vendi (asymptotic value when $n\rightarrow\infty$). For example, the left plot in Figure~\ref{fig:kernel_convergence} shows that in the case of the cosine similarity kernel, the Vendi score on $n$ randomly selected ImageNet \citep{deng2009imagenet} samples has almost converged as the sample size reaches 5000, where the dimension $d$ (using standard DINOv2 embedding \citep{oquab_dinov2_2023}) is 768. 

In contrast, our numerical results for kernel functions with an infinite feature map demonstrate that for standard datasets, a sample size bounded by 20,000 could be insufficient for convergence of the Vendi score. For example, the right plot of Figure~\ref{fig:kernel_convergence} shows the evolution of the Vendi score with the Gaussian kernel on ImageNet data, and the score continues to grow at a significant rate with 20,000 samples\footnote{The heavy computational cost prohibits an empirical evaluation of the sample size required for Vendi’s convergence.}.

Observing the difference between Vendi score convergence  for finite and infinite-dimension kernel functions, a natural question is how to extend the definition of Vendi score from finite to infinite dimension case such that the diversity score would statistically converge in both scenarios. We attempt to address the question by introducing an alternative Vendi statistic, which we call the \emph{$t$-truncated Vendi score}.
The $t$-truncated Vendi score is defined using only the top-$t$ $\lambda_1\ge \cdots \ge \lambda_t$ eigenvalues of the kernel matrix, where $t$ is an integer hyperparameter. This modified score is defined as  
\begin{equation*}\label{Eq: Intro-TruncatedVendi Score}    \mathrm{Truncated}\text{-}\mathrm{Vendi}^{(t)}(x_1,\ldots ,x_n)  = \exp\Bigl(\sum_{i=1}^t {\lambda^{\scriptscriptstyle \text{trunc}}_i} \log\frac{1}{\lambda^{\scriptscriptstyle \text{trunc}}_i}\Bigr)
\end{equation*}
where we shift each of the top-$t$ eigenvalue $\lambda^{\scriptscriptstyle \text{trunc}}_i = \lambda_i +c $ by the same constant $c=\bigl(1-\sum_{i=1}^t\lambda_i\bigr)/t$ to ensure they add up to $1$ and provide a valid probability model. Observe that for a finite kernel dimension $d$ satisfying $d\le t$, the truncated and original Vendi scores take the same value, because the truncation will have no impact on the eigenvalues. On the other hand, under an infinite kernel dimension, the two scores may take different values.



As a main theoretical result, we prove that a sample size $n=O(t)$ is always enough to estimate the \emph{$t$-truncated population Vendi} from $n$ empirical samples, regardless of the finiteness of the kernel feature dimension. This result shows that the  \emph{$t$-truncated} Vendi score provides a statistically converging extension of the Vendi score from the finite kernel dimension to the infinite dimension case. To connect the defined $t$-truncated Vendi score to existing computation methods for the original Vendi score, we show that the existing computationally-efficient methods for computing the Vendi score can be viewed as approximations of our defined \emph{$t$-truncated} Vendi. Specifically, we show that the Nyström method in \citep{friedman_vendi_2023} and the FKEA method proposed by \cite{ospanov_fkea_2024} provide an estimate of the $t$-truncated Vendi. %Therefore, our theoretical results suggest that the population limit of the truncated Vendi is indeed estimated by the computationally efficient Vendi computations proposed by \cite{friedman_vendi_2023} and \cite{ospanov_fkea_2024}.

\begin{figure*}[t]
    \centering
    \includegraphics[width=\linewidth]{figures_final/vendi_t_diagram.pdf}
    \caption{Computation of the proposed $t$-truncated Vendi score. The kernel similarity matrix eigenspectrum is truncated, and the mass of the truncated tail (excluding the top-$t$ eigenvalues) is uniformly redistributed among the top-$t$ eigenvalues.}
    \label{fig:truncated vendi diagram}
\end{figure*}
\vspace{-2mm}
\subsection{Our Results on RKE's Convergence}
For the RKE score, we prove a universal convergence guarantee that holds for every kernel function. The theoretical guarantee shows that the RKE score, and more generally every order-$\alpha$ entropy score with $\alpha\ge 2$, will converge to its population value within $O(\frac{1}{\sqrt{n}})$ error for $n$ samples. Our theoretical guarantee also transfers to the truncated version of the RKE score. However, note that the truncation of the eigenspectrum becomes unnecessary in the RKE case, since the score enjoys universal convergence guarantees.  Figure~\ref{fig:kernel_convergence} shows that using both the cosine-similarity and Gaussian kernel functions, the RKE score nearly converges to its limit value with less than 10000 samples.  

Finally, we present the findings of several numerical experiments to validate our theoretical results on the convergence of Vendi, truncated Vendi, and RKE scores. Our numerical results on standard image, text, and video datasets and generative models indicate that in the case of a finite-dimension kernel map, the Vendi score can converge to its asymptotic limit, in which case, as we explained earlier, the Vendi score is identical to the truncated Vendi. On the other hand, in the case of infinite-dimension Gaussian kernel functions, we numerically observe the growth of the score beyond $n=$10,000. Our numerical results further confirm that the scores computed by Nyström method in \citep{friedman_vendi_2023} and the FKEA method \citep{ospanov_fkea_2024} provide tight estimations of the population truncated Vendi. The following summarizes this work's contributions:
\begin{itemize}[leftmargin=*]
    \item Analyzing the statistical convergence of Vendi and RKE diversity scores under restricted sample sizes $n\lessapprox 2\times 10^4$,
    \item Providing numerical evidence on the Vendi score's lack of convergence for infinite-dimensional kernel functions, e.g. the Gaussian (RBF) kernel,
    \item Introducing the truncated Vendi score as a statistically converging extension of the Vendi score from finite to infinite dimension kernel functions,
    \item Demonstrating the universal convergence of the RKE diversity score across all kernel functions.
\end{itemize}











\iffalse
The increasing popularity of generative models has highlighted the need for comprehensive and effective diversity evaluation metrics. As these models become more prevalent across various applications, the assessment of the diversity of generated outputs has emerged as a pivotal aspect of their evaluation. In response to this demand, previous literature has introduced several reference-free metrics aimed at quantifying diversity.

For instance, \citeauthor{friedman_vendi_2023} proposed the Vendi metric, which leverages the eigenspectrum of similarity matrices to provide a reference-free assessment of diversity within the generated outputs. Vendi operates by analyzing the eigenvalues of the similarity matrix, which encapsulates relationships among data points, quantifying the diversity of underlying modes present in the dataset. Vendi score relies on computing the Von-Neumann entropy of the similarity matrix. \citeauthor{pasarkar2023cousins} further expanded the metric to Rényi entropy of varying parameter $\alpha$, called Vendi family of diversity metrics.

%Similarly, \citeauthor{jalali_information-theoretic_2023} introduced the RKE metric, which also relies on the eigenspectrum of similarity matrices to assess diversity. RKE is a special case of Rényi entropy with $\alpha=2$. Like Vendi, RKE is particularly advantageous in contexts where reference datasets are unavailable, making it a suitable choice for diverse applications in generative modeling.

Previous research has focused on evaluating the effectiveness of assigned diversity scores and comparing them with existing reference-based metrics, such as \hl{FID, KID, IS, Recall, and Coverage}. However, there has been insufficient investigation into the specific statistics these metrics aim to estimate. Furthermore, the computational complexity of Vendi is at least $O(n^3)$, making it inefficient for scaling with currently available hardware.

Most prior studies have limited sample sizes to 10,000 samples, which may be inadequate for accurately assessing the diversity of the underlying dataset. This raises significant challenges in score computation: in many cases, the scores may not converge, while the sample complexity approaches the upper limits of available hardware. This situation motivates an in-depth exploration of what exactly current reference-free metrics estimate.

This work aims to investigate the underlying statistics of Vendi, distinguishing them from the previously proposed Vendi score. Through both theoretical and experimental analysis, we demonstrate that restricting the sample size $n$ while computing the Vendi score can yield a meaningful statistic that scales with the underlying diversity factors. Additionally, we propose a truncated Vendi (Vendi-t) approach that utilizes the top eigenvalues of the similarity matrix. This method produces a meaningful score that converges to a population Vendi at a limited sample size $n$.

Following is a summary of our work's main contributions:
\begin{itemize}[leftmargin=*]
    \item Studying the population Vendi at fixed sample size $n$,
    \item Proposing and studying the convergence properties of the population Vendi with top $t$ truncated eigenvalues.
\end{itemize}
\fi
