\begin{figure}
    \centering
    \includegraphics[width=\textwidth]{figures_final/diversity_plots/imagenet_diverity_full.pdf}
    \caption{Diversity evaluation of Vendi scores on StyleGAN-XL generated ImageNet dataset with varying truncation parameter $\psi$. The setting is based on \textit{DinoV2} embedding and bandwidth $\sigma=30$}
  \label{VENDI_imagenet_diversity}
\end{figure}

In this section, we present supplementary results concerning the evaluation of diversity and the convergence behavior of different variants of the Vendi score. We extend the convergence experiments discussed in the main text to include the truncated StyleGAN3-t FFHQ dataset (Figure \ref{VENDI_stylegan3_convergence}) and the StyleGAN-XL ImageNet dataset (Figure \ref{VENDI_styleganxl_convergence}). Furthermore, we demonstrate that the truncated Vendi statistic effectively captures the diversity characteristics across various data modalities. Specifically, we conducted similar experiments as shown in Figures \ref{VENDI_imagenet_truncation} and \ref{VENDI_ffhq_truncation} on text data (Figure \ref{VENDI_countries_diversity}) and video data (Figure \ref{VENDI_video_diversity}), showcasing the applicability of the metric across different domains.

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{figures_final/diversity_plots/text_truncation_by_alphas.pdf}
    \caption{Diversity evaluation of Vendi scores on synthetic text dataset about 100 countries generated by GPT-4 with varying number of countries. The setting is based on \textit{text-embedding-3-large} embedding and bandwidth $\sigma=0.5$}
  \label{VENDI_countries_diversity}
\end{figure}

\begin{figure}
    \includegraphics[width=0.99\linewidth]{figures_final/illustrations/bandwidth_diagram.pdf}
    \caption{The diagram outlining an intuition behind a kernel bandwidth $\sigma$ selection in diversity evaluation.}
    \label{bandwidth_illustration}
\end{figure}

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{figures_final/diversity_plots/video_truncation_by_alphas.pdf}
    \caption{Diversity evaluation of Vendi scores on Kinetics400 dataset with varying number of classes. The setting is based on \textit{I3D} embedding and bandwidth $\sigma=4.0$}
  \label{VENDI_video_diversity}
\end{figure}

We observe in Figure \ref{VENDI_stylegan3_convergence} that the convergence behavior is illustrated across various values of $\psi$. The results indicate that, for a fixed bandwidth $\sigma$, the truncated, Nyström, and FKEA variants of the Vendi score converge to the truncated Vendi statistic. As demonstrated in Figure \ref{VENDI_ffhq_truncation} of the main text, this truncated Vendi statistic effectively captures the diversity characteristics inherent in the underlying dataset.

We note that in presence of incremental changes to the diversity of the dataset, finite-dimensional kernels, such as cosine similarity kernel, remain relatively constant. This effect is illustrated in Figure \ref{VENDI_styleganxl_convergence}, where increase in truncation factor $\psi$ results in incremental change in diversity. This is one of the cases where infinite-dimensional kernel maps with a sensitivity (bandwidth) parameter $\sigma$ are useful in controlling how responsive the method should be to the change in diversity.




\begin{figure}
    \centering
    \subfigure{\includegraphics[width=0.7\textwidth]{figures_final/truncation_convergence/ffhq_trunc_0.2.pdf}}
    \subfigure{\includegraphics[width=0.7\textwidth]{figures_final/truncation_convergence/ffhq_trunc_0.4.pdf}}
    \subfigure{\includegraphics[width=0.7\textwidth]{figures_final/truncation_convergence/ffhq_trunc_0.6.pdf}}
    \subfigure{\includegraphics[width=0.7\textwidth]{figures_final/truncation_convergence/ffhq_trunc_0.8.pdf}}
    \subfigure{\includegraphics[width=0.7\textwidth]{figures_final/truncation_convergence/ffhq_trunc_1.0.pdf}}
    \caption{Statistical convergence of Vendi score for different sample sizes on StyleGAN3 generated FFHQ data at various truncation factors $\psi$: (Left plot) finite-dimension cosine similarity kernel (Right plot) infinite dimension Gaussian kernel with bandwidth $\sigma=35$. \emph{DinoV2} embedding (dimension 768) is used in computing the scores.}
  \label{VENDI_stylegan3_convergence}
\end{figure}

\begin{figure}
    \centering
    \subfigure{\includegraphics[width=0.7\textwidth]{figures_final/truncation_convergence/imagenet_trunc_0.2.pdf}}
    \subfigure{\includegraphics[width=0.7\textwidth]{figures_final/truncation_convergence/imagenet_trunc_0.4.pdf}}
    \subfigure{\includegraphics[width=0.7\textwidth]{figures_final/truncation_convergence/imagenet_trunc_0.6.pdf}}
    \subfigure{\includegraphics[width=0.7\textwidth]{figures_final/truncation_convergence/imagenet_trunc_0.8.pdf}}
    \subfigure{\includegraphics[width=0.7\textwidth]{figures_final/truncation_convergence/imagenet_trunc_1.0.pdf}}
    \caption{Statistical convergence of Vendi score for different sample sizes on StyleGAN-XL generated ImageNet data: (Left plot) finite-dimension cosine similarity kernel (Right plot) infinite dimension Gaussian kernel with bandwidth $\sigma=40$. \emph{DinoV2} embedding (dimension 768) is used in computing the scores.}
  \label{VENDI_styleganxl_convergence}
\end{figure}



\begin{table}
    % \setlength{\tabcolsep}{0.2in}
    \centering
    % \scriptsize
    \caption{Statistical convergence of diversity scores for different sample size  on DALL-E 3 generated MSCOCO data}
    % \vspace{3mm}
    \begin{tabular}{lccccccc}
    \toprule
    $n$ & VENDI-1.0 & RKE & Vendi-t & FKEA-Vendi & Nystrom-Vendi & Recall & Coverage \\
    \midrule
    2000 & 239.91 & 13.47 & 239.91 & 228.69 & 239.91 & 0.76 & 0.86 \\
    4000 & 315.35 & 13.51 & 315.35 & 280.68 & 315.35 & 0.81 & 0.87 \\
    6000 & 357.15 & 13.56 & 346.27 & 310.9 & 345.49 & 0.83 & 0.91 \\
    8000 & 392.36 & 13.56 & 354.8 & 329.56 & 357.41 & 0.87 & 0.91 \\
    \bottomrule
    \end{tabular}
    \label{tab:dalle3-mscoco}
\end{table}

\begin{table}
    % \setlength{\tabcolsep}{0.2in}
    \centering
    % \scriptsize
    \caption{Statistical convergence of diversity scores for different sample size on SDXL generated MSCOCO data}
    % \vspace{3mm}
    \begin{tabular}{lccccccc}
    \toprule
    $n$ & VENDI-1.0 & RKE & Vendi-t & FKEA-Vendi & Nystrom-Vendi & Recall & Coverage \\
    \midrule
    2000 & 187.17 & 10.65 & 187.17 & 173.06 & 187.18 & 0.78 & 0.85\\
    4000 & 236.49 & 10.7 & 236.49 & 222.78 & 236.08 & 0.82 & 0.87\\
    6000 & 264.82 & 10.7 & 258.21 & 236.37 & 257.34 & 0.86 & 0.87\\
    8000 & 289.08 & 10.71 & 265.84 & 251.59 & 266.23 & 0.86 & 0.86\\
    10000 & 304.44 & 10.72 & 267.39 & 256.24 & 268.34 & 0.86 & 0.87\\
    \bottomrule
    \end{tabular}
    \label{tab:sdxl-mscoco}
\end{table}


\begin{table}
    % \setlength{\tabcolsep}{0.2in}
    \centering
    % \scriptsize
    \caption{Compilation time (in seconds) of different Vendi scores with increasing sample size}
    % \vspace{3mm}
    \begin{tabular}{lccccccc}
    \toprule

    \multirow{2}{*}{Metric} & \multicolumn{7}{c}{samples $n$} \\	
     & 10000 & 20000 & 30000 & 40000 & 50000 & 60000 & 70000 \\
    \midrule
    Vendi & 97s & 631s & 1868s & - & - & - & - \\
    FKEA-Vendi & 19s & 36s & 53s & 71s & 88s & 105s & 124s \\
    Nystrom-Vendi & 31s & 44s & 78s & 91s & 112s & 136s & 164s \\

    \bottomrule
    \end{tabular}
    \label{tab:time complexity}
\end{table}


\iffalse
\section{Convergence of Vendi Score under fixed sample size $n$}

We performed an experiment to observe the stability of the Vendi score by fixing the sample size $n$ and computing the metric over a dataset. Figures~\ref{fig:exp vendi ffhq} and \ref{fig:exp vendi imagenet} show that, for a given $n$ on FFHQ or ImageNet, the standard deviation of the Vendi score is below 1\%. This indicates that the score converges to its underlying expectation at a fixed $n$ extremely quickly, a fact that can be further verified using McDiarmid’s inequality. In other words, once $n$ is fixed, the Vendi score concentrates rapidly, even under infinite-dimensional kernel settings. However, this concentration does not imply convergence to the true population value, which is observable in both cases, since the score keeps increasing even at $n=10k$.

\begin{figure}
    \centering
    \subfigure{\includegraphics[width=0.5\textwidth]{figures_final/e_vendi/ffhq_diversity_plot.png}}
    \subfigure{\includegraphics[width=\textwidth]{figures_final/e_vendi/ffhq_violin_plots.png}}
    \caption{Violin plots and error bars showing the distribution and standard deviation of Vendi scores (50 resamples per point) across different sample sizes $n$ on the FFHQ dataset with $\sigma=45$.}
  \label{fig:exp vendi ffhq}
\end{figure}

\begin{figure}
    \centering
    \subfigure{\includegraphics[width=0.5\textwidth]{figures_final/e_vendi/imagenet_diversity_plot.png}}
    \subfigure{\includegraphics[width=\textwidth]{figures_final/e_vendi/imagenet_violin_plots.png}}
    \caption{Violin plots and error bars showing the distribution and standard deviation of Vendi scores (50 resamples per point) across different sample sizes $n$ on the ImageNet dataset with $\sigma=45$.}
  \label{fig:exp vendi imagenet}
\end{figure}
\fi