
\textbf{Diversity evaluation for generative models} Diversity evaluation in generative models can be categorized into two primary types: reference-based and reference-free methods. Reference-based approaches rely on a predefined dataset to assess the diversity of generated data. Metrics such as FID \citep{heusel_gans_2018}, KID and distributed KID \citep{binkowski2018demystifying,wang2023distributed} measure the distance between the generated data and the reference, while Recall \citep{sajjadi_assessing_2018, kynkaanniemi_improved_2019} and Coverage~\citep{naeem_reliable_2020} evaluate the extent to which the generative model captures existing modes in the reference dataset. \cite{pillutla2021mauve,pillutla-etal:mauve:jmlr2023} propose MAUVE metric that uses information divergences in a quantized embedding space to measure the gap between generated data and reference distribution. In contrast, the reference-free metrics, Vendi \citep{friedman_vendi_2023} and RKE \citep{jalali_information-theoretic_2023}, assign diversity scores based on the eigenvalues of a kernel similarity matrix of the generated data. \cite{jalali_information-theoretic_2023} interpret the approach as identifying modes and their frequencies within the generated data followed by entropy calculation for the frequency parameters. The Vendi and RKE scores have been further extended to quantify the diversity of conditional prompt-based generative AI models \citep{ospanov2024dissecting,jalali2024conditional} and to select generative models in online settings \citep{rezaei2024more,hu2024online,hu2025multi}. Also, \citep{zhang2024interpretable,zhang2025unveiling,jalali2025towards,gong2025kernel,wu2025fusingcrossmodalunimodalrepresentations} extend the entropic kernel-based scores to measure novelty and embedding dissimilarity. In our work, we specifically focus on the statistical convergence of the vanilla Vendi and RKE scores.

\textbf{Statistical convergence analysis of kernel matrices' eigenvalues.} The convergence analysis of the eigenvalues of kernel matrices has been studied by several related works. \cite{shawe2005eigenspectrum} provide a concentration bound for the eigenvalues of a kernel matrix. We note that the bounds in \citep{shawe2005eigenspectrum} use the expectation of eigenvalues $\mathbb{E}_m[\hat{\boldsymbol{\lambda}}(S)]$ for a random dataset $S=(\mathbf{x}_1,\dots,\mathbf{x}_m)$ of fixed size $m$ as the center vector in the concentration analysis. However, since eigenvalues are non-linear functions of a matrix, this concentration center vector $\mathbb{E}_m[\hat{\boldsymbol{\lambda}}(S)]$ does not match the eigenvalues of the asymptotic kernel matrix as the sample size approaches to infinity. On the other hand, our convergence analysis focuses on the asymptotic eigenvalues with an infinite sample size, which determines the limit value of Vendi scores.  
In another related work, \cite{bach_information_2022} discusses a convergence result for the Von-Neumann entropy of kernel matrix. While this result proves a non-asymptotic guarantee on the convergence of the entropy function, the bound may not guarantee convergence at standard sample sizes for computing Vendi scores (less than $10000$ in practice). In our work, we aim to provide convergence guarantees for the finite-dimension and generally truncated Vendi scores with restricted sample sizes.   

\textbf{Efficient computation of matrix-based entropy.} Several strategies have been proposed in the literature to reduce the computational complexity of matrix-based entropy calculations, which involve the computation of matrix eigenvalues—a process that scales cubically with the size of the dataset. \cite{dong2023optimal} propose an efficient algorithm for approximating matrix-based Renyi’s entropy of arbitrary order $\alpha$, which achieves a reduction in computational complexity down to $O(n^2sm)$ with $s,m\ll n$. Additionally, kernel matrices can be approximated using low-rank techniques such as incomplete Cholesky decomposition \citep{fine2001efficient, bach2002kernel} or CUR matrix decompositions \citep{curmatrix_michael}, which provide substantial computational savings. \cite{pasarkar2023cousins} suggest to leverage Nyström method \citep{williams2000nystrom} with $m$ components, which results in $O(nm^2)$ computational complexity. Further reduction in complexity is possible using Random Fourier Features, as suggested by \cite{ospanov_fkea_2024}, which allows the computation to scale linearly with $O(n)$ as a function of the dataset size. This work focuses on the latter two methods and the population quantities estimated by them.

\textbf{Impact of embedding spaces on diversity evaluation.} In our image-related experiments, we used the DinoV2 embedding \citep{oquab_dinov2_2023}, as \cite{stein_exposing_2023} demonstrate the alignment of this embedding with human evaluations. We note that the kernel function in the Vendi score can be similarly applied to other embeddings, including the standard InceptionV3\citep{szegedy_rethinking_2016} and CLIP embeddings \citep{radford_learning_2021} as suggested by \cite{kynkaanniemi_role_2022}. %Also, in our experiments on text data, we utilized the text-embedding-3-large \citep{openai2023textembedding} model, and for the video experiments, we employed the I3D embedding \citep{Carreira_i3d}. We use the mentioned embeddings in our experiments, while our theoretical results suggest that the convergence behavior would be similar for other embeddings.

