We used for our experiments two different types of datasets: synthetic datasets from the fundamental clustering problem suite (FCPS) and real datasets from the UCI repository. Their characteristics are summarised in Table~\ref{tab:benchmark_datasets}.

\input{figures/table_datasets}
\input{figures/table_lexicon}

We used different metrics for evaluating the clustering models as baselines. We essentially took metrics available in the permetrics Python package~\citep{thieu_permetrics_2024}, and expanded with additional metrics described in some R packages~\citep{desgraupes_clustering_2013, charrad_nbclust_2014}, which we implemented ourselves. The name of the metrics thus mainly follow the permetrics name, and we propose a lexicon in Table~\ref{tab:benchmark_lexicon}.

