\begin{figure*}
    \includegraphics[width=0.95\linewidth]{figures_final/kernel_convergence/ffhq_convergence.pdf}
    \caption{Statistical convergence of Vendi score for different sample sizes on FFHQ\citep{karras2019style} data: (Left plot) finite-dimension cosine similarity kernel (Right plot) infinite dimension Gaussian kernel with bandwidth $\sigma=35$. \emph{DINOv2} embedding (dimension 768) is used in computing the scores.}
    \label{VENDI_ffhq_convergence}
\end{figure*}

\begin{figure*}
    \includegraphics[width=0.95\linewidth]{figures_final/kernel_convergence/countries_convergence.pdf}
    \caption{Statistical convergence of Vendi score for different sample sizes on Synthetic Countries data: (Left plot) finite-dimension cosine similarity kernel (Right plot) infinite dimension Gaussian kernel with bandwidth $\sigma=0.6$. \emph{text-embedding-3-large} embedding (dimension 3072) is used in computing the scores.}
    \label{VENDI_countries_convergence}
\end{figure*}


\begin{figure*}
    \includegraphics[width=0.95\linewidth]{figures_final/kernel_convergence/k400_convergence.pdf}
    \caption{Statistical convergence of Vendi score for different sample sizes on Kinetics400\citep{kay2017kinetics} data: (Left plot) finite-dimension cosine similarity kernel (Right plot) infinite dimension Gaussian kernel with bandwidth $\sigma=4.0$. \emph{I3D} embedding (dimension 1024) is used in computing the scores.}
    \label{VENDI_k400_convergence}
\end{figure*}


\begin{figure*}
    \centering
    \includegraphics[width=0.83\linewidth]{figures_final/image_truncation/ffhq.pdf}
    \caption{Diversity evaluation of Vendi scores on truncated StyleGAN3 generated FFHQ dataset with varying truncation coefficient $\psi$. Fixed sample size $n=$20k is used for estimating the scores.
    }  \vspace{-4mm}\label{VENDI_ffhq_truncation}
\end{figure*}
%\end{document}


\begin{figure*}
    \centering
    \includegraphics[width=0.83\linewidth]{figures_final/image_truncation/truncation_by_alphas.png}
    \caption{Diversity evaluation of Vendi scores on truncated StyleGAN-XL generated ImageNet dataset with varying truncation coefficient $\psi$. Fixed sample size $n=$20k is used for estimating the scores.
    }
    \vspace{-5mm}
    \label{VENDI_imagenet_truncation}
\end{figure*}


We evaluated the convergence of the Vendi score, the truncated Vendi score, and the proxy Vendi scores using the Nyström method and FKEA in our numerical experiments. We provide a comparative analysis of these scores across different data types and models, including image, text, and video. In our experiments, we considered the cosine similarity kernel as a standard kernel function with a finite-dimension map and the Gaussian (RBF) kernel as a kernel function with an infinite-dimension feature map. In the experiments with Gaussian kernels, we matched the kernel bandwidth parameter with those chosen by \citep{jalali_information-theoretic_2023,ospanov_fkea_2024} for the same datasets. We used 20,000 number of samples per score computation, consistent with standard practice in the literature. %The experimental results were computed using four RTX 3090 GPUs. 
To investigate how computation-cutting methods compare to each other, in the experiments we matched the truncation parameter $t$ of our defined $t$-truncated Vendi score with the Nyström method's hyperparameter on the number of randomly selected rows of kernel matrix and the FKEA's hyperparameter of the number of random Fourier features. The Vendi and FKEA implementations were adopted from the corresponding references' GitHub webpages, while the Nyström method was adopted from the \texttt{scikit-learn} Python package.



%truncated Vendi score across various data types and models, including image, text, and video. In this section, we numerically test that the Nyström, FKEA, and truncated Vendi scores take on similar values and correlate with the diversity of data. %We provide a comparative analysis across multiple test cases. 

%Unless stated otherwise, 
\subsection{Convergence Analysis of Vendi Scores}

To assess the convergence of the discussed Vendi scores, we conducted experiments on four datasets including ImageNet and FFHQ~\citep{karras2019style} image datasets, a synthetic text dataset with 400k paragraphs generated by GPT-4 about 100 randomly selected countries, and the Kinetics video dataset \citep{kay2017kinetics}. Our results, presented in Figures \ref{VENDI_ffhq_convergence}, \ref{VENDI_countries_convergence}, and \ref{VENDI_k400_convergence}, show that for the finite-dimension cosine similarity kernel the Vendi score converges rapidly to the underlying value and the proxy versions including truncated and Nyström Vendi scores were almost identical to the original Vendi score. This observation is consistent with our theoretical results on the convergence of Vendi scores under finite-dimension kernel maps. On the other hand, in the case of infinite dimension Gaussian kernel, we observed that the $\mathrm{Vendi}_1$ score did not converge using 20k samples and the score value kept growing with a considerable rate. However, the $t$-truncated Vendi score with $t=10000$ converged to its underlying statistic shortly after 10000 samples were used. Consistent with our theoretical result, the proxy Nyström and FKEA estimated scores with their rank hyperparameter matched with $t$ also converged to the limit of the truncated Vendi scores. The numerical results show the connection between the truncated Vendi score and the existing kernel methods for approximating the Vendi score.

\subsection{Correlation between the truncated Vendi score and diversity of data}

We performed experiments to test the correlation between the truncated Vendi score and the ground-truth diversity of data. To do this, we applied the truncation technique to the FFHQ-based StyleGAN3~\citep{karras2021aliasfree} model and the ImageNet-based StyleGAN-XL~\citep{Sauer2021ARXIV} model and simulated generative models with different underlying diversity by varying the truncation technique. Considering the Gaussian kernel, we estimated the $t$-truncated Vendi score with $t=10000$ by averaging the estimated $t$-truncated Vendi scores over $5$ independent datasets of size 20k where the score seemed to converge to its underlying value. Figures~\ref{VENDI_ffhq_truncation},~\ref{VENDI_imagenet_truncation} show how the estimated statistic correlates with the truncation parameter for order-$\alpha$ Vendi scores with $\alpha = 1, \, 1.5, 2$. In all these experiments, the estimated truncated Vendi score correlated with the underlying diversity of the models. In addition, we plot the proxy Nyström and FKEA proxy Vendi values computed using 20000 samples which remain close to the estimated $t$-truncated statistic. These empirical results suggest that the estimated $t$-truncated Vendi score with Gaussian kernel can be used to evaluate the diversity of generated data. Also, the Nyström and FKEA methods were both computationally efficient in estimating the truncated Vendi score from limited generated data. We defer the presentation of the additional numerical results on the convergence of Vendi scores with different orders, kernel functions and embedding spaces to the Appendix.





















