%\documentclass{uai2024} % for initial submission
\documentclass[accepted]{uai2024} % after acceptance, for a revised version; 
% also before submission to see how the non-anonymous paper would look like 
                        
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2024} % ptmx math instead of Computer
                                         % Modern (has noticeable issues)
% \documentclass[mathfont=newtx]{uai2024} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}
\addto\extrasamerican{
  \def\subsectionautorefname{Section}
}
\addto\extrasamerican{
  \def\sectionautorefname{Section}
}

\usepackage{hyperref}
\usepackage{url}
\usepackage{floatrow}

\usepackage[utf8]{inputenc} % allow utf-8 input
\usepackage[T1]{fontenc}    % use 8-bit T1 fonts
\usepackage{booktabs}       % professional-quality tables
\usepackage{amsfonts}       % blackboard math symbols
\usepackage{nicefrac}       % compact symbols for 1/2, etc.
\usepackage{microtype}      % microtypography
\usepackage{tikz}
\usepackage{pgfplots}
\usepackage{babel}
\usepackage{mathtools}
\usepackage{caption}
\usetikzlibrary{arrows, decorations.markings,shapes,arrows.meta,fit}
%\usepackage[dvipsnames]{xcolor}
% \colorlet{LightBlue}{CornflowerBlue!40!}
% \colorlet{LightOrange}{BurntOrange!40!}
% \colorlet{LightGreen}{YellowGreen!40!}
% \newcommand{\bluehighlight}[1]{\tcbox[on line, boxsep=2pt, colframe=white, left=0pt, right=0pt, top=0pt, bottom=0pt, colback=LightBlue]{#1}}
% \newcommand{\greenhighlight}[1]{\tcbox[on line, boxsep=2pt, colframe=white, left=0pt, right=0pt, top=0pt, bottom=0pt, colback=LightGreen]{#1}}
% \newcommand{\orangehighlight}[1]{\tcbox[on line, boxsep=2pt, colframe=white, left=0pt, right=0pt, top=0pt, bottom=0pt, colback=LightOrange]{#1}}
\usepackage{tcolorbox}
%\usepackage{svg}
%\usepackage[inkscapeformat=png]{svg}
%\usepackage{inkscape}
\usepackage{wrapfig}
\usepackage{subcaption}
\usepackage{graphicx}

\usepackage{floatrow}
% Table float box with bottom caption, box width adjusted to content
\newfloatcommand{capbtabbox}{table}[][\FBwidth]

\usepackage{blindtext}
\usepackage[capitalize,noabbrev]{cleveref}

\usepackage{xr}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{bm}
\usepackage{amsfonts}
\usepackage{mathtools}
\usepackage{amsthm}
\definecolor{blue_mc}{RGB}{0 107 164}
\definecolor{dark_grey_random}{RGB}{89 89 89}
\definecolor{orange_bhatt}{RGB}{255 128 14}
\definecolor{light_grey_kl}{RGB}{171 171 171}
\usepackage{multirow}


%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{comment}
\usepackage{tikz} % nice language for creating drawings and diagrams
\DeclarePairedDelimiterX{\infdivx}[2]{(}{)}{%
  #1\;\delimsize\|\;#2%
}

%\pdfcompresslevel=10

%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

\title{Shedding Light on Large Generative Networks: \\Estimating Epistemic Uncertainty in Diffusion Models}

% The standard author block has changed for UAI 2024 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1,2]{\href{mailto:<lucas.berry@mail.mcgill.ca>?Subject=Your UAI 2024 paper}{Lucas Berry}{}}
\author[3]{Axel Brando}
\author[1,2]{David Meger}
% Add affiliations after the authors
\affil[1]{%
    School of Computer Science\\
    McGill University\\
    Montreal, Quebec, Canada
}
\affil[2]{%
    Centre for Intelligent Machines\\
    McGill University\\
    Montreal, Quebec, Canada
}
\affil[3]{%
    Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS)\\
    Spain
}

\pgfplotsset{compat=1.18}
\begin{document}
\maketitle

\begin{abstract}
 Generative diffusion models, notable for their large parameter count (exceeding 100 million) and operation within high-dimensional image spaces, pose significant challenges for traditional uncertainty estimation methods due to computational demands. In this work, we introduce an innovative framework, Diffusion Ensembles for Capturing Uncertainty (DECU), designed for estimating epistemic uncertainty for diffusion models. The DECU framework introduces a novel method that efficiently trains ensembles of conditional diffusion models by incorporating a static set of pre-trained parameters, drastically reducing the computational burden and the number of parameters that require training. Additionally, DECU employs Pairwise-Distance Estimators (PaiDEs) to accurately measure epistemic uncertainty by evaluating the mutual information between model outputs and weights in high-dimensional spaces. The effectiveness of this framework is demonstrated through experiments on the ImageNet dataset, highlighting its capability to capture epistemic uncertainty, specifically in under-sampled image classes. 

%REVISE IT AND CHANGE IT
%In this work, we introduce an innovative framework, diffusion ensembles for capturing uncertainty (DECU), designed for estimating epistemic uncertainty for diffusion models. Generative diffusion models, notable for their large parameter count (exceeding 100 million) and operation within high-dimensional image spaces, pose significant challenges for traditional uncertainty estimation methods due to computational demands. The DECU framework introduces a novel method that efficiently trains ensembles of conditional diffusion models by incorporating a static set of pre-trained parameters, drastically reducing the computational burden and the number of parameters that require training. Additionally, DECU employs Pairwise-Distance Estimators (PaiDEs) to accurately measure epistemic uncertainty by evaluating the mutual information between model outputs and weights in high-dimensional spaces. The effectiveness of this framework is demonstrated through experiments on the ImageNet dataset, highlighting its capability to capture epistemic uncertainty, specifically in under-sampled image classes. 

  %Epistemic uncertainty plays a pivotal role in contemporary machine learning, serving as a fundamental element that underlies decision-making processes, risk evaluations, and the overall generalizability of models. In this work, we introduce an innovative framework, diffusion ensembles for capturing uncertainty (DECU), designed for estimating epistemic uncertainty within the realm of large high-performing generative diffusion models. These models typically encompass over 100 million parameters and generate outputs within a high-dimensional image space. Consequently, applying conventional methods for estimating epistemic uncertainty is unrealistic without vast computing resources. To address this gap, this paper first presents a novel method for training ensembles of conditional diffusion models in a computationally efficient manner. This is achieved by fitting an ensemble within the conditional networks while using a static set of pre-trained parameters for the remainder of the model. As a result, we significantly reduce the computational load, enabling us to train only a fraction (one thousandth) of the entire network. Furthermore, this substantial reduction in the number of parameters to be trained leads to a marked decrease (87\%) in the required training steps compared to a full model on the same dataset. Second, we employ Pairwise-Distance Estimators (PaiDEs) to accurately capture epistemic uncertainty with these ensembles. PaiDEs efficiently gauge the mutual information between model outputs and weights in high-dimensional output space. To validate the effectiveness of our framework, we conducted experiments on the ImageNet dataset. The results demonstrate our ability to capture epistemic uncertainty, particularly for under-sampled image classes. This study represents a significant advancement in detecting epistemic uncertainty for conditional diffusion models, thereby casting new light on the \emph{black box} of these models.
\end{abstract}

\section{Introduction}

In this paper, we introduce Diffusion Ensembles for Capturing Uncertainty (DECU), a novel approach designed to quantify epistemic uncertainty in conditioned diffusion models that generate high-dimensional images ($256\times256\times3$). To the best of our knowledge, our method is the first in addressing the challenge of capturing epistemic uncertainty in conditional diffusion models for image generation. \autoref{fig:sales_pitch} illustrates an example of DECU generating images. In sub-figure (a), a class label with low epistemic uncertainty results in images closely resembling their class, while in sub-figure (b), a class label with high epistemic uncertainty leads to images that do not resemble their respective class.

DECU employs two key strategies. Firstly, it efficiently trains an ensemble of diffusion models within a subset of the network. This is achieved through the utilization of pre-trained networks from \citet{rombach2022high}. Training an ensemble of diffusion models in a naive manner would demand substantial computational resources, considering that each model encompasses hundreds of millions of parameters and requires weeks to train \citep{dhariwal2021diffusion}.

\begin{figure}[t]
\vskip 0.2in
\centering
\begin{subfigure}[t]{\columnwidth}
\centerline{\includegraphics[width=\textwidth]{figures/paper/certain_rollout.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\begin{subfigure}[t]{\columnwidth}
\centerline{\includegraphics[width=\textwidth]{figures/paper/uncertain_rollout.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\vspace*{5mm}
\caption{Image generation progression through DECU, each row refers to an ensemble component, for the class label of Bernese mountain dog with low epistemic uncertainty (a) and moving van with high epistemic uncertainty (b).}
\label{fig:sales_pitch}
\vskip -0.2in
\end{figure}

Secondly, DECU incorporates Pairwise-Distance Estimators (PaiDEs), a non-sample-based method proven effective in estimating the mutual information between the model's output and its weights in high-dimensional regression tasks \citep{kolchinsky2017estimating, berry2023escaping}. The mutual information between model weights and output is a well-established metric for measuring epistemic uncertainty \citep{houlsby2011bayesian}. PaiDEs capture this mutual information by assessing consensus among ensemble components through the distributional distance between each pair of components. Distributional distance serves as a metric to gauge the similarity between two probability distributions. 

Epistemic uncertainty stems from a model's ignorance and can be reduced with more data, while aleatoric uncertainty arises from inherent randomness in the data (e.g. when some crucial variables are hidden) and is thus irreducible \citep{hora1996aleatory,der2009aleatory,hullermeier2021aleatoric}.  With the increasing integration of large diffusion models into automated systems \citep{rombach2022high, dhariwal2021diffusion}, gaining a comprehensive understanding of the images generated by these black-box models becomes paramount. Generative image models play a crucial role in diverse applications, notably in medical image generation and self-driving systems \citep{guibas2017synthetic, kazerouni2022diffusion, hu2023gaia}. Both of these domains are riddled with uncertainty, capable of yielding catastrophic outcomes for human life. Our approach illuminates the black box of diffusion models by estimating their epistemic uncertainty, offering assistance in situations where predictions from automated systems are more uncertain. In addition, our proposed framework can be used to build solutions that satisfy international safety standards for automated systems (self-driving ISO/IEC 26262:2011 \cite{salay2018analysis} or the generic AI systems ISO/IEC 23053:2022).

%Epistemic uncertainty plays a pivotal role in machine learning, serving as a fundamental element that underlies decision-making processes, risk evaluations, and the overall generalizability of models.

By combining our efficient ensemble technique for diffusion models with PaiDEs, we address the challenge of capturing epistemic uncertainty in conditional diffusion models for image generation. We evaluate DECU on the ImageNet dataset \cite{russakovsky2015imagenet}, and our contributions can be summarized as follows:
\begin{itemize}
    \item We establish the framework of DECU for class-conditioned diffusion models (\autoref{sec:methods}).
    \item We assess the effectiveness of DECU on image generation on the ImageNet dataset, a commonly used but significantly challenging benchmark within the community (\autoref{sec:undersamp}).
    \item We provide an evaluation of image diversity within DECU (\autoref{sec:img_diversity}). %and demonstrate its utility in selecting ensemble components (Section \ref{sec:comp_select}).  
\end{itemize}
These advancements illuminate the previously opaque area of epistemic uncertainty in conditional diffusion models, offering significant implications for decision-making processes and risk evaluation.

\section{Background}

Diffusion models create a Markov chain, where, at each transition, they sample from a Gaussian distribution. This inherent feature makes them particularly suitable for uncertainty estimation, as the Gaussian probability distributions provide a natural framework for reasoning about uncertainty \citep{hullermeier2021aleatoric}. PaiDEs present an efficient method for estimating epistemic uncertainty by utilizing established pairwise distance formulas between Gaussian components within the ensemble.

%Diffusion models progressively step through a Markov chain and at each transition sample a Gaussian distribution. This makes them ideal candidates for uncertainty estimation as the probability distributions provide a natural way to reason about uncertainty \citep{hullermeier2021aleatoric}. PaiDEs provide an efficient way to estimate epistemic uncertainty from the ensemble of diffusion models be leveraging their Gaussian distributions.

\subsection{Problem Statement \& Diffusion Models} \label{sec:probstate}
In the context of supervised learning, we define a dataset $\mathcal{D}=\{x_i, y_{i,0}\}_{i=1}^N$, where $x_i$ represents class labels, and each $y_{i,0}$ corresponds to an image with dimensions of $256\times256\times3$. Our primary goal is to estimate the conditional probability $p(y|x)$, which is complex, high-dimensional, continuous, and multi-modal.



To effectively model $p(y|x)$, we utilize diffusion models, which have gained significant recognition for their ability to generate high-quality images \citep{rombach2022high, saharia2022photorealistic}. These models employ a two-step approach referred to as the forward and reverse processes to generate realistic images. Please note that we will omit the subscript $i$ from $y_{i,0}$ and $x_i$ for simplicity in notation. In the forward process, an initial image $y_0$ undergoes gradual corruption through the addition of Gaussian noise in $T$ steps, resulting in a sequence of noisy samples $y_1, y_2, \ldots, y_T$:
\begin{align*}
    & q(y_t|y_{t-1}) = \mathcal{N}(y_t;\sqrt{1-\beta_t}y_{t-1}, \beta_t{\bf I}) \\ & q(y_{1:T}|y_{0})= \prod_{t=1}^Tq(y_{t}|y_{t-1}),
\end{align*}
where $\beta_t\in(0,1)$ and $\beta_1<\beta_2<...<\beta_T$. The forward process draws inspiration from non-equilibrium statistical physics \citep{sohl2015deep}.

The reverse process aims to remove noise from the corrupted images and reconstruct the original image, conditioned on the class label. This is accomplished by estimating the conditional distribution $q(y_{t-1}|y_{t},x)$ using the model $p_{\theta}$. The reverse diffusion process can be represented as follows:
\begin{align}
\begin{split}
    &p_{\theta}(y_{0:T}|x)=p(y_T)\prod_{t=1}^Tp_{\theta}(y_{t-1}|y_t,x)\quad\\ &p_{\theta}(y_{t-1}|y_t,x)=\mathcal{N}(y_{t-1};\mu_{\theta}(y_t,t,x),\Sigma_{\theta}(y_t,t,x)). \label{eq:diff_model_dist_out}
\end{split}
\end{align}
In this formulation, $p_{\theta}(y_{t-1}|y_t,x)$ represents the denoising distribution parameterized by $\theta$, which follows a Gaussian distribution with mean $\mu_{\theta}(y_t,t,x)$ and covariance matrix $\Sigma_{\theta}(y_t,t,x)$. Note that $\mu_{\theta}(y_t,t,x)$ and $\Sigma_{\theta}(y_t,t,x)$ are learned models. The forward and reverse diffusion processes each create a Markov chain to generate images.

To model the reverse process $p_{\theta}$, calculating the exact log-likelihood $\log(p_{\theta}(y_0|x))$ is typically infeasible. This necessitates the use of the evidence lower bound (ELBO), a technique reminiscent of variational autoencoders (VAEs) \citep{kingma2013auto}. The ELBO can be expressed as follows:
{\footnotesize
\begin{align}
\begin{split}
    -\log(p_{\theta}(y_0|x))\leq& -\log(p_{\theta}(y_0|x))\\&+D_{KL}\infdivx{q(y_{1:T}|y_0)}{p_{\theta}(y_{1:T}|y_0,x)} \label{eq:diffusion_model_loss}.
\end{split}
\end{align}
}

The loss function in \autoref{eq:diffusion_model_loss} represents the trade-off between maximizing the log-likelihood of the initial image and minimizing the KL divergence between the true posterior $q(y_{1:T}|y_0)$ and the approximate posterior $p_{\theta}(y_{1:T}|y_0,x)$. \autoref{eq:diffusion_model_loss} can be simplified using the properties of diffusion models. For a more comprehensive introduction to diffusion models, please refer to \citet{ho2020denoising}. 

\subsection{Epistemic Uncertainty and PaiDEs}\label{sec:epi_paides}
Probability theory provides a natural framework to reason about uncertainty \citep{thomas2006elements, hullermeier2021aleatoric}. In the context of capturing uncertainty from a conditional distribution, a widely used metric is that of conditional differential entropy,
\begin{align*}
    H(y_{t-1}|y_t,x) = -\int p(y_{t-1}|y_t,x)\ln{p(y_{t-1}|y_t,x)}dy_t.
\end{align*}
Leveraging conditional differential entropy \citet{houlsby2011bayesian} defined epistemic uncertainty as follows,
\begin{align}
\begin{split}
    I(y_{t-1}, \theta|y_{t},x) &= H(y_{t-1}|y_{t},x)\\ &-E_{p(\theta)}\left[H(y_{t-1}|y_{t},x,\theta)\right] \label{eq:epi},
\end{split}
\end{align}
where $I(\cdot)$ denotes mutual information and $\theta \sim p(\theta)$. Mutual information measures the information gained about one variable by observing the other. When all of $\theta$'s produce the same $p_{\theta}(y_0|y_T,x)$, $I(y_{t-1}, \theta|y_{t},x)$ is zero, indicating no epistemic uncertainty and that each component agrees about the output distribution. Conversely, when said distributions have non-overlapping supports, epistemic uncertainty is high and each ensemble component disagrees strongly about the output distribution.


A distribution over weights becomes essential for estimating $I(y_{t-1}, \theta|y_{t},x)$. One effective approach for doing this is through the use of ensembles. Ensembles harness the collective power of multiple models to estimate the conditional probability by assigning weights to the output from each ensemble component. This can be expressed as follows:
{\footnotesize
\begin{align}
    p_{\theta}(y_{t-1}|y_t,x)=\sum_{j=1}^M\pi_jp_{\theta_j}(y_{t-1}|y_t,x)\label{eq:ensemble_likelihhood} \qquad \sum_{j=1}^M\pi_j=1,
\end{align}
}

where $M$, $\pi_j$ and $\theta_j$ denote the number of model components, the component weights and different component parameters, respectively. Note that the model components are assumed to be uniform, $\pi_j = \frac{1}{M}$, as this approach has been demonstrated to be effective for estimating epistemic uncertainty \citep{chua2018deep, berry2023escaping}. When creating an ensemble, two common approaches are typically considered: randomization \citep{breiman2001random} and boosting \citep{freund1997decision}. While boosting has paved the way for widely adopted machine learning methods \citep{chen2016xgboost}, randomization stands as the preferred choice in the realm of deep learning due to its tractability and straightforward implementation \citep{lakshminarayanan2017simple}.
 
\begin{figure*}[t]
\vskip 0.2in
\begin{center}
\centerline{\includegraphics[width=.8\textwidth]{figures/paper/flow_chart.png}}
\caption{The ensemble pipeline for DECU, shown here with two components. During the reverse process, the previous latent vector $z^j_{t}$ passes through a UNet to yield $z^j_{t-1}$. Dashed lines signify the random selection of one ensemble component for rollout until the branching point. Our ensembles are constructed within the embedding layer, which accepts the class label as input. We create diversity through random initialization and by training each component on different subsets of the data. The encoders, decoders, and UNets for each component are shared, and we leverage pretrained networks from \citet{rombach2022high}. Notably, this reduces the number of parameters required for training from 456 million to 512 thousand.}
%Note that during the reverse process the previous latent vector $z^j_{t}$, the time step $t$ and the output from component $j$ are passed through a UNet to arrive at $z^j_{t-1}$. Also note that the dashed lines indicate that one ensemble component is randomly selected to be rolled out until reaching the branching point. Additionally, for our ensembles, we employ random initialization for networks that take class labels as input, while utilizing pre-trained encoders, decoders, and UNets for each ensemble component.}
\label{fig:flow_chart}
\end{center}
\vskip -0.2in
\end{figure*}

In the context of continuous outputs and ensemble models, \autoref{eq:epi} often does not have a closed-form solution due to the left hand-side:
{\footnotesize
\begin{align*}
    H(y_{t-1}|y_{t},x) &=\int\sum_{j=1}^M\pi_jp_{\theta_j}(y_{t-1}|y_{t},x)\\&\times\ln\sum_{j=1}^M\pi_jp_{\theta_j}(y_{t-1}|y_{t},x)dy_t.
\end{align*}
}

Thus, previous methods have relied on Monte Carlo (MC) estimators to estimate epistemic uncertainty \citep{depeweg2018decomposition, postels2020hidden}. MC estimators are convenient for estimating quantities through random sampling and are more suitable for high-dimensional integrals compared to other numerical methods. However, as the number of dimensions increases, MC methods typically require a larger number of samples \citep{rubinstein2009deal}.

Given that our output is very high-dimensional, MC methods become extremely computationally demanding, necessitating an alternative approach. For this, we rely on Pairwise-Distance Estimators (PaiDEs) to estimate epistemic uncertainty \citep{kolchinsky2017estimating}. PaiDEs have been shown to accurately capture epistemic uncertainty for high-dimensional continuous outputs \citep{berry2023escaping}. Let $D\infdivx{p_i}{p_j}$ denote a generalized distance function between the probability distributions $p_i$ and $p_j$, where $p_i$ and $p_j$ represent $p_i=p(y_{t-1}|y_{t},x,\theta_i)$ and $p_j=p(y_{t-1}|y_{t},x,\theta_j)$, respectively. More specifically, $D$ is referred to as a premetric, satisfying $D\infdivx{p_i}{p_j}\geq0$ and $D\infdivx{p_i}{p_j}=0$ if $p_i=p_j$. The distance function need not be symmetric nor obey the triangle inequality. As such, PaiDEs can be defined as follows:
{\footnotesize
\begin{align*}
%\begin{split}
\hat{I}_{\rho}(y_{t-1}, \theta|y_{t},x) &= -\sum_{i=1}^M \pi_i\ln{\sum_{j=1}^M \pi_j\exp\left(-D\infdivx{p_i}{p_j}\right)}%\label{eq:epiestimator}.
%\end{split}
\end{align*}
}

PaiDEs offer a variety of options for $D\infdivx{p_i}{p_j}$, such as Kullback-Leibler divergence, Wasserstein distance, Bhattacharyya distance, Chernoff $\alpha$-divergence, Hellinger distance and more. %The task can dictate a practitioner's choice of $D$.

\section{Methodology} \label{sec:methods}
Diffusion models come with a substantial training cost, requiring 35 V100 days for latent diffusion class-conditioned models on ImageNet \citep{rombach2022high}. Naively training $M$ distinct diffusion models only worsens this computational load. To address this challenge, we propose training a sub-module within the diffusion model architecture and show that this is adequate for estimating epistemic uncertainty. Furthermore, there are multiple junctures within the reverse diffusion process where one could effectively estimate uncertainty. We demonstrate the specific point at which this estimator yields accurate estimates.
%\vspace{-0.5cm}
\subsection{Diffusion Ensembles} \label{sec:diff_ensembles}
We employ the latent diffusion models introduced by \citet{rombach2022high} to construct our ensembles. They proposed the use of an autoencoder to learn the diffusion process in a latent space, significantly reducing sampling and training time compared to previous methods by operating in a lower-dimensional space, $z_{t}$, which is $64\times64\times3$. Using this framework we can estimate epistemic uncertainty in this lower-dimensional space,
\begin{align}
\begin{split}
    \hat{I}_{\rho}(z_{t-1}, \theta|z_{t},x) &= -\sum_{i=1}^M \pi_i\\&\times\ln{\sum_{j=1}^M \pi_j\exp\left(-D\infdivx{p_i}{p_j}\right)}\label{eq:epiestimator_latent},
\end{split}
\end{align}
where $p_i$ and $p_j$ now denote Gaussians in the latent space. This approach is akin to previous methods that utilize latent spaces to facilitate the estimation of epistemic uncertainty \citep{berry2023normalizing}. 

To fit our ensembles, we make use of pre-trained weights for the UNet and autoencoder from \citet{rombach2022high}, keeping them static throughout training. The only part of the network that is trained is the conditional label embedding layer, which is randomly initialized for each ensemble component. This significantly reduces the number of parameters that need to be trained (512k instead of 456M) as well as the training time (by 87\%), compared to training a full latent diffusion model on ImageNet. It is important to note that each ensemble component can be trained in parallel, as the shared weights remain static for each component, further enhancing training efficiency.
    
Upon completion of the training process, we utilize the following image generation procedure:
\begin{itemize}
    \item[1.] Sample random noise $z_{T}$ and an ensemble component $p_j$.
    \item[2.] Use $p_j$ to traverse the Markov chain until reaching step $b$, our branching point.
    \item[3.] Branch off into $M$ separate Markov chains, each associated with a different component.
    \item[4.] Progress through each Markov chain until reaching step 0, $z^j_0$, and then decoding each $z^j_0$ to get $y^j_0$.
\end{itemize}
\autoref{fig:flow_chart} illustrates the described pipeline with two components. Note that during the reverse process the previous latent vector $z^j_{t}$, the time step $t$ and the output from component $j$ are passed through a UNet to arrive at $z^j_{t-1}$. By leveraging the inherent Markov chain structure within the diffusion model, we can examine image diversity at different branching points. Note that our loss function for training each component is the same as \citet{rombach2022high}. We utilize an ensemble of 5 components, a number we found to be sufficient for estimating epistemic uncertainty. For additional hyperparameter details, refer to \autoref{adx:hyper}.

\begin{figure}[t]
\vskip 0.2in
\begin{center}
\centerline{\includegraphics[width=\columnwidth]{figures/paper/unc_plot_2.png}}
\caption{Our estimator for epistemic uncertainty increases with distance from the branch point, converging to $-\ln\frac{1}{5}\approx1.609$.}
\label{fig:unc_vs_bp}
\end{center}
\vskip -0.2in
\end{figure}


\subsection{Diffusion Ensembles for Uncertainty} \label{sec:paides_diff}
Diffusion models yield a Gaussian distribution at each step during the reverse process, as shown in \autoref{eq:diff_model_dist_out}. One can estimate epistemic uncertainty at any $t$ beyond the branching point $b$; however, the further away from the branching point epistemic uncertainty is estimated, the more the Gaussian distributions diverge from one another. Consequently, when PaiDEs are applied in this scenario, they will converge to $-\ln\frac{1}{M}$. This behavior occurs because as the Gaussians diverge more and more, the distance measure, $D\infdivx{p_i}{p_j}$, grows which implies $\exp(-D\infdivx{p_i}{p_j})$ tends to 0. \autoref{fig:unc_vs_bp} shows this relationship in our context. Therefore, to estimate $I(z_{t-1},\theta|z_t, x, b=t)$, we utilize PaiDEs right after the branching point as we found this sufficient to estimate epistemic uncertainty. 





To generate images, we utilize denoising diffusion implicit models (DDIM) with 200 steps, following the training of a diffusion process with $T=1000$. DDIM enables more efficient image generation by permitting larger steps in the reverse process without altering the training methodology for diffusion models \citep{song2020denoising}. Furthermore, in the DDIM implementation by \citet{rombach2022high}, the covariance, $\Sigma_{\theta} (z_t, t, x)$, is intentionally set to a zero matrix, irrespective of its inputs, aligning with the approach in \citet{song2020denoising}. However, this prevents us from using KL-Divergence and Bhattacharyya distance, which are undefined in this case. Therefore, we propose a novel PaiDE using the 2-Wasserstein Distance, which is well-defined between Gaussians in such cases. This distance can be expressed as:

\begin{align}
\begin{split}
    W_2\infdivx{p_i}{p_j} &= ||\mu_i-\mu_j||_2^2
    \\&+\text{tr}\left[\Sigma_i+\Sigma_j-2\left(\Sigma_i^{1/2}\Sigma_j\Sigma_i^{1/2}\right)^{1/2}\right], \label{eq:wass_2_gauss}      
\end{split}
\end{align}
where $p_i\sim N(\mu_i,\Sigma_i)$ and $p_j\sim N(\mu_j,\Sigma_j)$. When $\Sigma_i$ and $\Sigma_j$ are zero matrices, it yields the following estimator:
\begin{align}
\begin{split}
    \hat{I}_{W}(z_{t-1}, \theta|z_{t},x, b=t) &= -\sum_{i=1}^M \pi_i\\
    &\times\ln{\sum_{j=1}^M \pi_j\exp\left(-W_2\infdivx{p_i}{p_j}\right)}\label{eq:epi_wass_estimator},
\end{split}
\\[2ex]
\begin{split}
    W_2\infdivx{p_i}{p_j} &= ||\mu_i-\mu_j||_2^2. \nonumber
\end{split}
\end{align}
This combination of ensemble creation and epistemic uncertainty estimation encapsulates DECU.

%\vspace{-0.5cm}



\section{Experimental Results} \label{sec:results}
The experiments in this study assessed the DECU method by utilizing the ImageNet dataset, a comprehensive collection comprising 1.28 million images distributed across 1000 classes. ImageNet is recognized as a challenging benchmark dataset for large generative models \citep{brock2018large, dhariwal2021diffusion}. To evaluate the performance of DECU, a specific subset called the \emph{binned classes} dataset was carefully curated in order to assess epistemic uncertainty estimates. This subset included 300 classes divided into distinct bins: 100 classes for bin 1, another 100 for bin 10, and an additional 100 for bin 100. The remaining 700 classes were grouped into bin 1300. For each ensemble component, a dataset was formed with the following selection process:
\begin{itemize}
    \item 1 random image per class from bin 1.
    \item 10 random images per class from bin 10.
    \item 100 random images per class from bin 100.
    \item All 1300 images per class from bin 1300 were utilized.
\end{itemize}
Throughout the training process, each ensemble component was exposed to a total of 28,162,944 images, accounting for repeated images across training epochs. It is worth noting that this stands in contrast to the 213,600,000 images required to train an entire network from scratch for class-conditioned ImageNet models \citep{rombach2022high}. 

%All experiments were carried out on the ImageNet dataset \citep{russakovsky2015imagenet}, which comprises 1000 classes, with approximately 1300 images per class, totaling 1.28M images. To evaluate DECU, we curated the \emph{binned classes} dataset from ImageNet. The creation of the \emph{binned classes} dataset involved the random selection of 100 classes for bin 1, another 100 for bin 10, and a further 100 for bin 100, such that they were disjoint. Subsequently, for each ensemble component, we adopted the following systematic approach: selecting a single image per class from bin 1, ten images per class from bin 10, and a hundred images per class from bin 100. The remaining 700 classes were grouped into bin 1300, where all 1300 images per class were utilized. During the training process, each ensemble component saw a total of 28,162,944 images, accounting for repeated images across training epochs. It is worth noting that this stands in contrast to the 213,600,000 images required to train an entire network from scratch for class-conditioned ImageNet models \citep{rombach2022high}.

\begin{figure*}[t]
\vskip 0.2in
\begin{center}
\centerline{\includegraphics[width=.7\textwidth]{figures/paper/certain_vs_uncertain1.png}}
\caption{The left image displays low epistemic uncertainty image generation (bin 1300) for five class labels: bullfrog, carbonara, grey fox, container ship, and yellow lady's slipper. The right image shows high epistemic uncertainty image generation (bin 1) for cleaver, Sealyham terrier, lotion, shoji, and whiskey jug. Each row represents an ensemble component with $b=1000$.}
%The left image showcases an example of image generation for five class labels with low epistemic uncertainty (bin 1300), arranged from left to right: bullfrog, carbonara, grey fox, container ship, and yellow lady's slipper. The right image illustrates an example of image generation for five class labels with high epistemic uncertainty (bin 1), arranged from left to right: cleaver, Sealyham terrier, lotion, shoji, and whiskey jug. Each row corresponds to an ensemble component and $b=1000$.}
\label{fig:certain_vs_uncertain}
\end{center}
\vskip -0.2in
\end{figure*}

\subsection{Recognition of Undersampled Classes}
\label{sec:undersamp}
In this section, we assess the capability of our framework to distinguish classes with limited training images using the \emph{binned classes} dataset. Notably, bins with lower values produced lower-quality images, as illustrated in \autoref{fig:certain_vs_uncertain}. This figure showcases images with lower epistemic uncertainty generated from five classes in bin 1300 on the left, and images with greater uncertainty generated from five classes in bin 1 on the right. Each row corresponds to an ensemble component, and we set $b=1000$. The visual contrast highlights a clear trend: with a higher number of training images in bin 1300, our framework produces images that closely align with the respective class labels. This observation is further supported by \autoref{fig:certain_vs_uncertain2} in the Appendix, which presents another illustrative example of the same trend.


\begin{figure*}[t]
\vskip 0.2in
\begin{center}
\centerline{\includegraphics[width=\textwidth]{figures/paper/bin_distributions_2.png}}
\caption{This figure displays uncertainty distributions for each bin, derived from corresponding class uncertainty estimates.}
\label{fig:ridgeplot_unc_imagenet}
\end{center}
\vskip -0.2in
\end{figure*}

Furthermore, we compute $\hat{I}_W(z_0,\theta|z_5,x,b=5)$ for each class. To do this, we randomly select 8 samples of random noise and use $b=5$. It's important to note that we can only take steps of 5 through the diffusion process due to the 200 DDIM steps. We then average the ensemble's epistemic uncertainty over these 8 random noise samples. \autoref{fig:ridgeplot_unc_imagenet} illustrates the distributions of epistemic uncertainty for each bin. The distributions for the larger bins are skewed more towards 0 compared to the smaller bins. This trend is also reflected in the mean of each distribution, represented by the dashed lines. These findings demonstrate that DECU can effectively measure epistemic uncertainty on average for class-conditioned image generation.


Additionally to estimating the overall uncertainty of a given class, we analyze per-pixel uncertainty in a generated image. We treat each pixel as a separate Gaussian and apply our estimator on a pixel-by-pixel basis. It's worth noting that we first map from the latent vector to image space, so we are estimating epistemic uncertainty in image space and then average across the three channels. An example of this procedure can be seen in \autoref{fig:pixel_unc}. For bin 1300, we observe that epistemic uncertainty highlights different birds that could have been generated from our ensemble. Furthermore, bins with lower values exhibit a higher density of yellow, indicating greater uncertainty about what image to generate. Two additional examples contained in the Appendix, \autoref{fig:pixel_unc1} and \autoref{fig:pixel_unc2}, display the same patterns.

\begin{table}
\caption{SSIM calculated between all pairs of generated images per class at different values of $b$ across each bin. Results shown are mean $\pm$ one standard deviation. Higher values indicate greater similarity and the highest mean in each row is bolded.} \label{tbl:ssim_img_diversity}
\begin{center}
{\scriptsize
\begin{tabular}{ccccc}
\toprule
$b$ &      1    &      10   &      100  &      1300 \\
\midrule
1000   &  $0.36\pm0.09$ & $ 0.37\pm0.09$ &  $0.41\pm0.10$ &  $\textbf{0.51}\pm0.13$ \\
750  &  $0.50\pm0.14$ &  $0.51\pm0.14$ &  $0.54\pm0.14$ &  $\textbf{0.63}\pm0.13$ \\
500 &  $0.64\pm0.13$ &  $0.64\pm0.13$ &  $0.67\pm0.11$ &  $\textbf{0.76}\pm0.09$ \\
250 &  $0.92\pm0.05$ & $ 0.92\pm0.05$ &  $0.92\pm0.04$ &  $\textbf{0.94}\pm0.03$ \\
\bottomrule
\end{tabular}
}

\end{center}
\end{table}

\subsection{Image Diversity between Components}
\label{sec:img_diversity}

\begin{figure}[!htb]
\vskip 0.2in
\begin{center}
\centerline{\includegraphics[width=\textwidth]{figures/paper/pixel_unc0.png}}
\caption{Pixel uncertainty (yellow for high, blue for low) shown for one class in each bin (left to right: wall clock, head cabbage, rubber eraser, Red Shank bird). Numbers below images indicate mean estimated $\hat{I}_W(z_0,\theta|z_5,x, b=5)$ $\pm$ one standard deviation.}
\label{fig:pixel_unc}
\end{center}
\vskip -0.25in
\end{figure}

\begin{figure}[t]
\vskip 0.2in
\centering
\begin{subfigure}[t]{\columnwidth}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/bp_progression_1300_1000_smaller.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\begin{subfigure}[t]{\columnwidth}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/bp_progression_1300_750_smaller.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\begin{subfigure}[t]{\columnwidth}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/bp_progression_1300_500_smaller.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\begin{subfigure}[t]{\columnwidth}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/bp_progression_1300_250_smaller.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\vspace*{5mm}
\caption{Image generation progression through DECU for the class label coral fungus from bin 1300 for each branching point: (a) 1000, (b) 750, (c) 500, (d) 250.}
\label{fig:bp_1300}
\vskip -0.2in
\end{figure}

Apart from assessing image uncertainty, we also conducted an analysis of image diversity across the ensemble with respect to different branching points. To gauge this diversity, we generated images using our framework and computed the Structural Similarity Index Measure (SSIM) between every pair of generated images produced by each component. The results can be found in \autoref{tbl:ssim_img_diversity}. Notably, bins with larger values produced images that were more similar. This is attributed to the fact that ensemble components learned to better represent classes in the bin with larger values, resulting in greater agreement amongst the ensemble components. Furthermore, as the branching point increases, the images become more dissimilar. This phenomenon arises because, with a higher $b$, each ensemble component progresses further through the reverse process independently, leading to greater image variation. Visualizations of this phenomenon can be seen in \autoref{fig:bp_1300} and \autoref{fig:bp_1}, where the variety in image generation clearly dissipates as the branching point decreases. Additional visualizations are contained in the Appendix (\autoref{fig:bp_100} and \autoref{fig:bp_10}). 
%and \autoref{fig:diversity_by_bp} in the Appendix
\section{Related Works}

Constructing ensembles of diffusion models is challenging due to the large number of parameters, often in the range of hundreds of millions \citep{saharia2022photorealistic}. Despite this difficulty, methods such as eDiff-I have emerged, utilizing ensemble techniques to improve image fidelity \citep{balaji2022ediffi}. In contrast, our approach specifically targets the measurement of epistemic uncertainty.

Previous research has employed Bayesian approximations for neural networks in conjunction with information-based criteria to tackle the problem of epistemic uncertainty estimation in image classification tasks \citep{gal2017deep, kendall2017uncertainties, kirsch2019batchbald}. These works apply epistemic uncertainty estimation to simpler discrete output spaces. In addition to Bayesian approximations, ensembles are another method for estimating epistemic uncertainty \citep{lakshminarayanan2017simple, choi2018waic, chua2018deep}. They have been used to quantify epistemic uncertainty in regression problems \citep{depeweg2018decomposition, postels2020hidden, berry2023normalizing, berry2023escaping}. \citet{postels2020hidden} and \citet{berry2023normalizing} develop efficient ensemble models based on Normalizing Flows (NF) that accurately capture epistemic uncertainty. \citet{berry2023escaping} advances these findings by utilizing PaiDEs to estimate epistemic uncertainty on 257-dimensional output space with normalizing flows. Our work builds on this line of research by showcasing how to extend these methods to higher-dimensional outputs (196,608 dimensions) and for large generative diffusion models. 

To capture epistemic uncertainty, we employ the mutual information between model outputs and model weights \citep{houlsby2011bayesian}. This metric has previously been utilized for data acquisition in active learning settings, notably in BALD \citep{houlsby2011bayesian} and BatchBALD \citep{kirsch2019batchbald}. Applying such techniques to diffusion models is well justified, as collecting data for image generation models proves to be a costly endeavor. However, currently, it is infeasible to do active learning for large diffusion models due to the high computational costs associated with training after each acquisition batch. Anticipating future advancements in computational resources holds the promise of increased feasibility to explore these ideas. This underscores another potential use case for epistemic uncertainty in diffusion models.

In addition to PaiDEs, various methods have emerged for estimating epistemic uncertainty without relying on sampling \citep{van2020uncertainty, charpentier2020posterior}. \citet{van2020uncertainty} and \citet{charpentier2020posterior} primarily focus on classification tasks. While \citet{charpentier2021natural} extends tackle regression tasks, it is limited to modeling outputs as distributions within the exponential family and is less general than PaiDEs. Furthermore, they only consider regression tasks with 1D outputs as their method is Bayesian and more computationally expensive.

\begin{figure}[t]
\vskip 0.2in
\centering
\begin{subfigure}[t]{\textwidth}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/bp_progression_1_1000_smaller.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\begin{subfigure}[t]{\textwidth}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/bp_progression_1_750_smaller.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\begin{subfigure}[t]{\textwidth}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/bp_progression_1_500_smaller.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\begin{subfigure}[t]{\textwidth}
\centerline{\includegraphics[width=\textwidth]{figures/appendix/bp_progression_1_250_smaller.png}}
%\label{fig:bp_1300_1000}
\caption{}
\end{subfigure}
\vspace*{5mm}
\caption{Image generation progression through DECU for the class label monastery from bin 1 for each branching point: (a) 1000, (b) 750, (c) 500, (d) 250.}
\label{fig:bp_1}
\vskip -0.2in
\end{figure}

\section{Conclusion}
To the best of our knowledge, we are the first to address the problem of epistemic uncertainty estimation for conditional diffusion models. Large generative models are becoming increasingly prevalent, and thus insight into the generative process is invaluable. We achieve this by introducing the DECU framework, which leverages an efficient ensembling technique and Pairwise-Distance Estimators (PaiDEs) to estimate epistemic uncertainty efficiently and effectively. Our experimental results on the ImageNet dataset showcase the effectiveness of DECU in estimating epistemic uncertainty. We explore per-pixel uncertainty in generated images, providing a fine-grained analysis of epistemic uncertainty. As the field of deep learning continues to push the boundaries of generative modeling, our framework provides a valuable tool for enhancing the interpretability and trustworthiness of large-scale generative models.


\begin{comment}

\newpage

\section{Introduction}\label{sec:intro}
UAI 2024 papers have to be prepared using \LaTeX.
To start writing your paper, copy \texttt{uai2024-template.tex} and replace title, authorship, and content with your own.

The UAI 2024 paper style is based on a custom \textsf{uai2024} class.
The class file sets the page geometry and visual style.\footnote{%
    The class uses the packages \textsf{adjustbox}, \textsf{environ}, \textsf{letltxmacro}, \textsf{geometry}, \textsf{footmisc}, \textsf{caption}, \textsf{textcase}, \textsf{titlesec}, \textsf{titling}, \textsf{authblk}, \textsf{enumitem}, \textsf{microtype}, \textsf{lastpage}, and \textsf{kvoptions}.
}
The class file also loads basic text fonts.\footnote{%
    Fonts loaded are \textsf{times} (roman), \textsf{helvet} (sanserif), \textsf{courier} (fixed-width), and \textsf{textcomp} (common symbols).
}
\emph{You may not modify the geometry or style in any way, for example, to squeeze out a little bit of extra space.}
(Also do not use \verb|\vspace| for this.)
Feel free to use convenience functionality of loaded packages such as \textsf{enumitem}.
The class enables hyperlinking by loading the \textsf{hyperref} package.

You are free to load any packages available in \TeX{Live}~2020 that are compatible with the UAI class.\footnote{In case this template or your submission does not compile, always first make sure your \TeX\ installation is up-to-date.}
(Mik\TeX{} and Mac\TeX{} generally contain the same packages.)
Do not load conflicting packages—you will get an error message—, as this complicates creating the proceedings.
Please avoid using obsolete commands, such as \verb|\rm|, and obsolete packages, such as \textsf{epsfig}.\footnote{%
    See \url{https://ctan.org/pkg/l2tabu}.
}

\swap[ ]{in the header of your source file.}{Feel free to include your own macros}

\section{General Formatting Instructions}
As a general rule: \emph{follow the template}.

\subsection{Authorship}
Reviewing is double-blind.
However, you can already fill in your author names and affiliations in the \verb|\author| block in the preamble following the example of the template because the class will remove it as long as the option \textsf{accepted} is not passed to the class.
Nevertheless, make sure any other information in the paper does not disclose your identity, for example URLs to supplementary material.

\subsection{Sectioning}
Three numbered sectioning commands are provided: \verb|\section|, \verb|\subsection|, and \verb|\subsubsection|.
Please respect their order, so do not put a \verb|\subsubsection| directly beneath a \verb|\section|.
One unnumbered sectioning command is provided, \verb|\paragraph|.
It can be used directly below any numbered section level.
Do not use any other sectioning commands.

\subsubsection{Typing the Section Titles}
The \verb|\section| and \verb|\subsection| titles are uppercased by the class.
Please type them in title case.
(This is used in the PDF bookmarks.)
Please also write the \verb|\subsubsection| titles in title case.

\paragraph{What is title case?}
\href{https://en.wikipedia.org/wiki/Title_case}{Wikipedia} explains:
\begin{quote}
    Title case or headline case is a style of capitalization used for rendering the titles of published works or works of art in English.
    When using title case, all words are capitalized except for ‘minor’ words (typically articles, short prepositions, and some conjunctions) unless they are the first or last word of the title.
\end{quote}

\subsection{References, Citations, Footnotes}\label{sec:etc}
\subsubsection{Cross-Referencing}
Always use \verb|\label| and \verb|\ref|—or a command with a similar effect—when cross-referencing.
For example, this subsection is Section~\ref{sec:etc}.

\subsubsection{Citations}
Citations should include the author's last name and year.
They should be part of the sentence.
An example parenthetical citation: “Good introductions to the topic are available \citep{latexcompanion}.”
An example textual citation: “\citet{einstein} discusses electrodynamics of moving bodies.”
Do not use a parenthetical citation where a textual one is appropriate.
An example of what \emph{not} to do: “\citep{einstein} discusses electrodynamics of moving bodies.”

We strongly advise to use reference list software such as Bib\TeX{} and a citation package such as \textsf{natbib}.
The reference style you use should be compatible with the author-year citations.
Both the citation style and reference style used should be consistent.

For the original submission, take care not to reveal the authors' identity through the manner in which one's own previous work is cited.
For example, writing
“I discussed electrodynamics of moving bodies before \citep{einstein}.” would be inappropriate, as it reveals the author's identity.
Instead, write “\citet{einstein} discussed electrodynamics of moving bodies.”

\subsubsection{Footnotes}
You can include footnotes in your text.\footnote{
    Use footnotes sparingly, as they can be distracting, having readers skip back and forth between the main text and the foot of the page.
}
The footnote mark should follow the fragment to which it refers, so a footnote\footnote{
    A footnote is material put at the foot of a page.
}
for a word has a footnote mark attached to that word and a footnote for a phrase or sentence has a footnote mark attached to the closing punctuation.

\section{Math}\label{sec:math}
The class file does not load any math support package like \textsf{amsmath}\footnote{%
  See the \textsf{amsmath} documentation at \url{https://ctan.org/pkg/amsmath} for further details.
}.
We advise using the \textsf{mathtools}\footnote{%
  See the \textsf{mathtools} documentation at \url{https://ctan.org/pkg/mathtools} for further details.
}
package, which extends \textsf{amsmath} with fixes and even more useful commands.
Feel free to load other support packages for symbols, theorems, etc.

Use the \textsf{amsmath} environments for displayed equations.
So, specifically, use the \texttt{equation} environment instead of \verb|$$...$$| and the \texttt{align} environment instead of \texttt{eqnarray}.\footnote{For reasons why you should not use the obsolete \texttt{eqnarray} environment, see Lars Madsen, \textit{Avoid eqnarray!} TUGboat 33(1):21--25, 2012.}
An \texttt{equation}:
\begin{equation}\label{eq:example}
  0 = 1 - 1.
\end{equation}
Two \texttt{align}'ed equations:
\begin{align*} % no numbers with starred version
  1 + 2 &= 3,\\
  1 - 2 &= -1.
\end{align*}
Equations can also be put inline, of course.
For example, Equation~\eqref{eq:example}: \(0=1+1\). % $0=1+1$ also works
(Notice that both inline and displayed math are part of the sentence, so punctuation should be added to displayed math.)

The \textsf{amsmath} and \textsf{mathtools} packages provide a lot of nice functionality, such as many common math operators, e.g., \(\sin\) and \(\max\), and also commands for defining new ones.

\section{Floats}\label{sec:floats}
Floats, such as figures, tables and algorithms, are moving objects and are supposed to float to the nearest convenient location.
Please do not force them to go in the middle of a paragraph.
They must respect the column width.

Two-column floats are possible.
They appear at the top of the next page, so strategic placement may be necessary.
For an example, see Figure~\ref{fig:tikz}.
They may not enter the margins.
\begin{figure*}
    \centering
    \begin{tikzpicture}[xscale=1.5]
        \coordinate (origin);
        \draw[->] (origin) -- +(1cm,0) node[below] {$x$};
        \draw[->] (origin) -- +(0,1cm) node[left] {$y$};
        \fill[gray] (45:1cm) circle[radius=.2cm];
    \end{tikzpicture}
    \caption{A Nice Filled Ellipse with a Pair of Coordinate Axes.}\label{fig:tikz}
\end{figure*}

All material in floats should be legible and of good quality.
So avoid very small or large text and pixelated or fuzzy lines.

\subsection{Figures}\label{sec:figures}
Figures should go in the \texttt{figure} environment and be centered therein.
The caption should go below the figure.
Use \verb|\includegraphics| for external graphics files but omit the file extension.
Supported formats are \textsf{pdf} (preferred for vector drawings and diagrams), \textsf{png} (preferred for screenshots), and \textsf{jpeg} (preferred for photographs).
Do not use \verb|\epsfig| or \verb|\psfig|.
If you want to scale the image, it is better to use a fraction of the line width rather than an explicit length.
For example, see Figure~\ref{fig:city}.

\begin{figure}[!htb]
  \centering
  \includegraphics[width=0.7\linewidth]{barcelona.jpg}
  \caption{A View of a Nice City.}\label{fig:city}
\end{figure}

Do not use \verb|\graphicspath|.
If the images are contained in a subdirectory, specify this when you include the image, for example \verb|\includegraphics{figures/mypic}|.

\subsection{Tables}\label{sec:tables}
Tables should go in the \texttt{table} environment and be centered therein.
The caption should go above the table and be in title caps.
For an example, see Table~\ref{tab:data}.
\begin{table}
    \centering
    \caption{An Interesting Table.}\label{tab:data}
    \begin{tabular}{rl}
      \toprule % from booktabs package
      \bfseries Dataset & \bfseries Result\\
      \midrule % from booktabs package
      Data1 & 0.12345\\
      Data2 & 0.67890\\
      Data3 & 0.54321\\
      Data4 & 0.09876\\
      \bottomrule % from booktabs package
    \end{tabular}
\end{table}

\subsection{Algorithms}\label{sec:algorithms}
You can load your favorite algorithm package, such as \textsf{algorithm2e}\footnote{See the \textsf{algorithm2e} documentation at \url{https://ctan.org/pkg/algorithm2e}.}.
Use the environment defined in the package to create a centered float with an algorithm inside.

\section{Back Matter}
There are a some final, special sections that come at the back of the paper, in the following order:
\begin{itemize}
  \item Author Contributions (optional)
  \item Acknowledgements (optional)
  \item References
\end{itemize}
They all use an unnumbered \verb|\subsubsection|.

For the first two special environments are provided.
(These sections are automatically removed for the anonymous submission version of your paper.)
The third is the ‘References’ section.
(See below.)

(This ‘Back Matter’ section itself should not be included in your paper.)


\begin{contributions} % will be removed in pdf for initial submission 
					  % (without ‘accepted’ option in \documentclass)
                      % so you can already fill it to test with the
                      % ‘accepted’ class option
    Briefly list author contributions. 
    This is a nice way of making clear who did what and to give proper credit.
    This section is optional.

    H.~Q.~Bovik conceived the idea and wrote the paper.
    Coauthor One created the code.
    Coauthor Two created the figures.
\end{contributions}

\begin{acknowledgements} % will be removed in pdf for initial submission,
						 % (without ‘accepted’ option in \documentclass)
                         % so you can already fill it to test with the
                         % ‘accepted’ class option
    Briefly acknowledge people and organizations here.

    \emph{All} acknowledgements go in this section.
\end{acknowledgements}

% References
\bibliography{uai2024-template}

\newpage

\onecolumn

\title{Title in Title Case\\(Supplementary Material)}
\maketitle



This Supplementary Material should be submitted together with the main paper.

\appendix
\section{Additional simulation results}
Table~\ref{tab:supp-data} lists additional simulation results; see also \citet{einstein} for a comparison. 

\begin{table}[!h]
    \centering
    \caption{An Interesting Table.} \label{tab:supp-data}
    \begin{tabular}{rl}
        \toprule % from booktabs package
        \bfseries Dataset & \bfseries Result\\
        \midrule % from booktabs package
        Data1 & 0.12345\\
        Data2 & 0.67890\\
        Data3 & 0.54321\\
        Data4 & 0.09876\\
        \bottomrule % from booktabs package
    \end{tabular}
\end{table}

\section{Math font exposition}
% NOTE: necessary when ptmx or no mathfont class option is given
\providecommand{\upGamma}{\Gamma}
\providecommand{\uppi}{\pi}
How math looks in equations is important:
\begin{equation*}
    F_{\alpha,\beta}^\eta(z) = \upGamma(\tfrac{3}{2}) \prod_{\ell=1}^\infty\eta \frac{z^\ell}{\ell} + \frac{1}{2\uppi}\int_{-\infty}^z\alpha \sum_{k=1}^\infty x^{\beta k}\mathrm{d}x.
\end{equation*}
However, one should not ignore how well math mixes with text:
The frobble function \(f\) transforms zabbies \(z\) into yannies \(y\).
It is a polynomial \(f(z)=\alpha z + \beta z^2\), where \(-n<\alpha<\beta/n\leq\gamma\), with \(\gamma\) a positive real number.

\end{comment}

%\begin{contributions} % will be removed in pdf for initial submission 
					  % (without ‘accepted’ option in \documentclass)
                      % so you can already fill it to test with the
                      % ‘accepted’ class option
%    Briefly list author contributions. 
%    This is a nice way of making clear who did what and to give proper credit.
%    This section is optional.

    %Lucas Berry conceived the idea, wrote the paper, wrote the code and conducted all the experiments.
    %Axel Brando collaborated weekly to help develop the ideas in the paper, debug the code, come up with experiments and helped write the paper. In addition, Axel Brando ran experiments that did not make their way into the paper.
    %David Meger helped write the paper.
%\end{contributions}

\begin{acknowledgements} % will be removed in pdf for initial submission,
						 % (without ‘accepted’ option in \documentclass)
                         % so you can already fill it to test with the
                         % ‘accepted’ class option
    %Briefly acknowledge people and organizations here.

    %The research leading to these results has received funding from the Horizon Europe Programme under the SAFEXPLAIN Project (www.safexplain.eu), grant agreement num. 101069595 and the Horizon Europe Programme under the AI4DEBUNK Project (www.ai4debunk.eu), grant agreement num. 101135757. %Additionally, this work has been partially supported by Grant PID2019-107255GB-C21 funded by MCIN/AEI/10.13039/501100011033.
    %Additionally, this work has been partially supported by Grant JDC2022-050313-I and PID2019-107255GB-C21 funded by MCIN/AEI/10.13039/501100011033  and by European Union NextGenerationEU/PRTR.

    The research leading to these results has received funding from the Horizon Europe Programme under the SAFEXPLAIN Project (https://www.safexplain.eu), grant agreement num. 101069595 and the Horizon Europe Programme under the AI4DEBUNK Project (https://www.ai4debunk.eu), grant agreement num. 101135757. Additionally, this work has been partially supported by  PID2019-107255GB-C21  funded by MCIN/AEI/10.13039/501100011033  and   JDC2022-050313-I  funded by MCIN/AEI/10.13039/501100011033al by European Union NextGenerationEU/PRTR.

    %\emph{All} acknowledgements go in this section.
\end{acknowledgements}

% References
\bibliography{uai2024}

\newpage

\onecolumn

\title{Shedding Light on Large Generative Networks: \\Estimating Epistemic Uncertainty in Diffusion Models \\ (Supplementary Material)}
\maketitle
\appendix
\input{supp}

\end{document}
