In our experiments, we select the Gaussian kernel bandwidth, $\sigma$, to ensure that the Vendi metric effectively distinguishes the inherent modes within the dataset. The kernel bandwidth directly controls the sensitivity of the metric to the underlying data clusters. As illustrated in Figure \ref{bandwidth_illustration}, varying $\sigma$ significantly impacts the diversity computation on the ImageNet dataset. A smaller bandwidth (e.g., $\sigma = 20, 30$) results in the metric treating redundant samples as distinct modes, artificially inflating the number of clusters, which in turn slows down the convergence of the metric. On the other hand, large bandwidth results in instant convergence of the metric, i.e. in $\sigma=60$ $n=100$ and $n=1000$ have almost the same amount of diversity.



