\section{Compute and Hyperparameter Details}
\label{adx:hyper}
We employed the same set of hyperparameters as detailed in \cite{rombach2022high} while training our ensemble of diffusion models. To facilitate this, we utilized their codebase available at (\href{https://github.com/CompVis/latent-diffusion}{https://github.com/CompVis/latent-diffusion}), making specific modifications to incorporate DECU. It's important to note that we specifically adopted the LDM-VQ-8 version of latent diffusion, along with the corresponding autoencoder, which maps images from 256x256x3 to 64x64x3 resolution. Our training infrastructure included an AMD Milan 7413 CPU clocked at 2.65 GHz, boasting a 128M cache L3, and an NVidia A100 GPU equipped with 40 GB of memory. Each ensemble component was trained in parallel and required 7 days of training with the specified computational resources. Our code is available at the following \href{https://github.com/nwaftp23/DECU}{link}.



\section{Data}
\label{adx:data}
% \begin{wrapfigure}{r}{0.5\textwidth}
% %\vskip 0.2in
% \centering%\begin{center}
% %\centerline{\includegraphics[width=0.18\textwidth]{figures/paper/img_diversity_bp.png}}
% \includegraphics[width=0.48\textwidth]{figures/paper/img_diversity_bp.png}
% \caption{Image diversity as measured as the mean SSIM between all pairs of images generated from the ensemble.}
% \label{fig:diversity_by_bp}
% %\end{center}
% %\vskip -0.2in
% \end{wrapfigure}
In the \emph{binned classes} dataset, classes were randomly selected for each bin, and the images for each component were also chosen at random from the respective classes. In contrast, the \emph{masked classes} dataset employed a clustering approach that grouped class labels sharing the same hypernym in WordNet. This grouping strategy aimed to bring together image classes with similar structures; for instance, all the dog-related classes were clustered together. Subsequently, each ensemble component randomly selected hypernym clusters until each component had a minimum of 595 classes. Note that each class was seen by at least two components.  



\section{Image Generation and Branch Point}
\label{adx:img_progression}
In addition to the summary statistics concerning image diversity based on the branching point, we also provide visualizations of these effects in \autoref{fig:bp_100}, and \autoref{fig:bp_10}. These illustrations highlight the observation that bins with higher values tend to produce more consistent images that closely match their class label across all branching points. This distinction is particularly noticeable when comparing bin 1300 to bin 1. Furthermore, as the branching point increases, a greater variety of images is generated across all bins.



\section{Limitations}
\label{adx:limitations}
DECU has potential for generalization to other large generative models. However, it's important to note that applying PaiDEs for uncertainty estimation requires the conditional distribution of the output to be probability distribution with a known pairwise-distance formula. This requirement is not unusual, as some generative models, such as normalizing flows, produce known distributions as their base distribution \citep{tabak2010density, tabak2013family, nf-rezende15}.

Furthermore, our ensemble-building approach is tailored to the latent diffusion pipeline but can serve as a logical framework for constructing ensembles in the conditional part of various generative models. There's also potential for leveraging low-rank adaption (LoRA) to create ensembles in a more computationally efficient manner \citep{hu2021lora}. However, it's worth mentioning that using LoRA for ensemble construction raises open research questions, as LoRA was originally developed for different purposes and not specifically designed for ensemble creation.


\section{Uncertainty \& Branch Point}
\label{adx:unc_bp}
Assuming that the distributional distances between ensemble components grow as one progresses through the reverse process, similar to other models with similar dynamics \citep{chua2018deep}, we can demonstrate the following: if $\lim_{D(p_i||p_j) \to \infty}$ for $i \neq j$, then $\hat{I}_{\rho}(y, \theta|x) = -\ln\frac{1}{M}$.
\begin{proof}
\begin{align*}
\hat{I}_{\rho}(y, \theta|x)&=-\sum_{i=1}^M \pi_i\ln\left[\sum_{j=1}^M \pi_j\exp(-D(p_i||p_j))\right]\\
&=-\sum_{i=1}^M \pi_i\ln\left[\pi_j\exp(-D(p_i||p_i))+\sum_{j\neq i} \pi_j\exp(-D(p_i||p_j))\right]\\
&=-\sum_{i=1}^M \pi_i\ln\left[\pi_j\exp(0)+\sum_{j\neq i} 0\pi_j\right]\\
&=-\sum_{i=1}^M \pi_i\ln\left[\frac{1}{M}\exp(0)\right]\\
&=-\sum_{i=1}^M \frac{1}{M}\ln\left[\frac{1}{M}\right]\\
&=-\ln\left[\frac{1}{M}\right]\\
\end{align*}
\end{proof}



\clearpage
\begin{figure}[t]
\vskip 0.2in
\begin{center}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/certain_vs_uncertaint2.png}}
\caption{The left image showcases an example of image generation for five class labels with low epistemic uncertainty (bin 1300), arranged from left to right: water buffalo, harvester, sulphur crested cockatoo, european fire salamander, tow truck. The right image illustrates an example of image generation for five class labels with high epistemic uncertainty (bin 1), arranged from left to right: pedestal, slide rule, modem, space heater, gong. Note that each row corresponds to an ensemble component and $b=1000$.}
\label{fig:certain_vs_uncertain2}
\end{center}
\vskip -0.2in
\end{figure}
\clearpage

\begin{figure}[t]
\vskip 0.2in
\begin{center}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/pixel_unc1.png}}
\caption{This shows the pixel uncertainty (high uncertainty in yellow and low uncertainty in blue) for one category from each bin, from left to right: cocktail shaker, howler monkey, Dungeness crab, bullet train. The number below the images shows the mean estimated $I(z_0,\theta|z_5,x,b=5)$ $\pm$ one standard deviation.}
\label{fig:pixel_unc1}
\end{center}
\vskip -0.2in
\end{figure}

\begin{figure}[ht]
\vskip 0.2in
\begin{center}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/pixel_unc2.png}}
\caption{This shows the pixel uncertainty (high uncertainty in yellow and low uncertainty in blue) for one category from each bin, from left to right: grey whale, knot, terrapin, agaric. The number below the images shows the mean estimated $I(z_0,\theta|z_5,x,b=5)$ $\pm$ one standard deviation.}
\label{fig:pixel_unc2}
\end{center}
\vskip -0.2in
\end{figure}



%%%%%%%%%%%%

%%%%%%%%%%
\begin{figure}[t]
\vskip 0.2in
\centering
\begin{subfigure}[t]{\textwidth}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/bp_progression_100_1000_smaller.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\begin{subfigure}[t]{\textwidth}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/bp_progression_100_750_smaller.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\begin{subfigure}[t]{\textwidth}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/bp_progression_100_500_smaller.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\begin{subfigure}[t]{\textwidth}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/bp_progression_100_250_smaller.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\vspace*{5mm}
\caption{Image generation progression through the diffusion model for the class label marmoset from bin 100 for each branching point: (a) 1000, (b) 750, (c) 500, (d) 250.}
\label{fig:bp_100}
\vskip -0.2in
\end{figure}

%%%%%%%%%%%%

\begin{figure}[t]
\vskip 0.2in
\centering
\begin{subfigure}[t]{\textwidth}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/bp_progression_10_1000_smaller.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\begin{subfigure}[t]{\textwidth}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/bp_progression_10_750_smaller.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\begin{subfigure}[t]{\textwidth}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/bp_progression_10_500_smaller.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\begin{subfigure}[t]{\textwidth}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/bp_progression_10_250_smaller.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\vspace*{5mm}
\caption{Image generation progression through the diffusion model for the class label steel arch bridge from bin 10 for each branching point: (a) 1000, (b) 750, (c) 500, (d) 250.}
\label{fig:bp_10}
\vskip -0.2in
\end{figure}
%%%%%%%%%%%%
