\subsection{Generation of Image Data}
% \Wei{the other baselines appear to be also bounded (thresholding tricks), albeit not theoretically guaranteed}

% \Wei{may consider to compute negative log-likelihood}

% \Wei{include some snippet of real images in the main context}

We test our method on large-scale image datasets using CIFAR-10 and ImageNet 64$\times$64. As the RGB value is between $[0, 1]$, we naturally select the domain as $\Omega=[0,1]^d$, where $d=3\times 32 \times 32$ for the CIFAR-10 task and $d=3\times 64 \times 64$ for the ImageNet task. %Similar to the commonly used Variance Exploding SDE (VESDE) in unbounded SGMs and SB-SBSDE \citep{song_likelihood_training, forward_backward_SDE},
It is known that the SB system can be initialized with score-based generative models \citep{forward_backward_SDE} and the warm-up study for reflected SB is presented in Appendix \ref{how_to_init}. We choose RVE-SDE as the prior path measure. % with $\sigma_{\min}=0.01$ and $\sigma_{\max}=5$. % we don't have a formula with $\sigma$ yet.
The prior distribution of $\nu_{\star}$ is the uniform distribution on $\Omega$.
The SDE is discretized into 1000 steps.
In both scenarios, images are generated unconditionally, and the quality of the samples is evaluated using Frechet Inception Distance (FID)
%\citep{heusel2017gans} % try to use less references
over 50,000 samples.
The forward score function is modeled using U-net structure; 
% \citep{ronneberger2015u}; % try to use less references
the backward score function uses NCSN++ \citep{score_sde} for the CIFAR-10 task and ADM \citep{SGMS_beat_GAN} for the ImageNet task.
Details of the experiments are shown in Appendix \ref{appendix:experiment}.


\begin{wraptable}{r}{8.5cm}
\vspace{-0.5cm}
{\scriptsize
\begin{tabular}{lcccc} \\ 
CIFAR-10 & Constrained & OT & NLL & FID \\ \toprule  
MCSN++ \citep{score_sde} & No & No & 2.99 & 2.20 \\ \midrule
% LSGM \citep{vahdat2021score} & No & 2.10 \\ \midrule % comment out this one is mainly to reduce number of references (to make 3 pages of reference)
DDPM \citep{DDPM} & No & No & 3.75 & 3.17 \\ \midrule
SB-FBSDE \citep{forward_backward_SDE} & No & Yes & - & 3.01 \\ \midrule
Reflected SGM \citep{reflected_diffusion_model} & Yes & No & 2.68 & 2.72 \\ \midrule
\textbf{Ours} & Yes & Yes & 3.08 & 2.98 \\ \midrule
 &  \\
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
ImageNet 64$\times$64 &  \\ \toprule  
PGMGAN \citep{armandpour2021partition} & No & No & -- & 21.73 \\ \midrule 
GLIDE \citep{li2022composing} & No & No & -- & 29.18 \\ \midrule
GRB \citep{park2022generative} & No & No & -- & 26.57 \\ \midrule 
\textbf{Ours} & Yes & Yes & 3.20 & 23.95 \\ \bottomrule
\end{tabular}%
\caption{Evaluation of generative models on image data. }
% \vspace{-0.1cm}
}
\label{tab:image_results}
\end{wraptable} 
We have included baselines for both constrained and unconstrained generative models and summarized the experimental results in Table 1. While our model may not surpass the state-of-the-art models, the minor improvement over the unconstrained SB-FBSDE \citep{forward_backward_SDE} underscores the effectiveness of the reflection operation. Moreover, the experiments verify the scalability of the reflected model and the training process is consistent with the findings in \citet{reflected_diffusion_model}, where the reflection in cube domains is easy to implement and the generation becomes more stable. Sample outputs are showcased in Figure \ref{fig_demo} (including MNIST), with additional figures available in Appendix \ref{appendix:experiment}. Notably, our generated samples exhibit diversity and are visually indistinguishable from real data.
\begin{figure*}[!ht]
\centering
\subfigure{\includegraphics[width=\textwidth]{figures/mnist_cifar_imagenet_demo_main_v4.png}}
\vskip -0.12in
\caption{Samples via reflected SB on MNIST (left), CIFAR10 (middle), and ImageNet 64 (right).}
\label{fig_demo}
\end{figure*}





\subsection{Generation in the Simplex Domain}
Alongside the irregular domains illustrated in Figure \ref{reflected_Langevin} and the hypercube for image generation, we implement the method on the high-dimensional \textit{projected simplex}. A $d$-projected simplex is defined as
$\bar\Delta_d := \{\boldsymbol{x} \in \mathbb{R}^d : \sum_i \boldsymbol{x}_i \leq 1, \boldsymbol{x}_i \geq 0 \} $.
Our method relies on reflected diffusion process instead of using diffeomorphic mapping (stick breaking) as in \cite{reflected_diffusion_model}.
As a comparison, we replicate the generative process using diffeomorphic mapping as well.


The data is created by collecting the image classification scores of Inception v3
from the last softmax layer with 1008 dimension. All the data fit into the projected simplex $\bar\Delta_{1008}$.
The Inception model is loaded from a pretrained checkpoint\footnote{
\url{https://github.com/mseitzer/pytorch-fid/releases/download/fid\_weights}}, and the classification task is performed on the $64\times64$ Imagenet validation dataset of 50,000 images.
The neural network of the score function is composed of 6 dense layers with 512 latent nodes.
In every diffusion step, we use the reflection operator described in Algorithm \ref{reflection_alg} to constrain the data within the projected simplex.
The alternative method is using stick breaking method to constrain the diffusion process. The transformation includes the mapping
$[f(\boldsymbol{x})]_i = \boldsymbol{x}_i \prod_{j=i+1}^d (1-\boldsymbol{x}_j)$
and the inverse mapping 
$[f^{-1}(\boldsymbol{y})]_i = \frac{\boldsymbol{y}_i}{1 - \sum_{j=i+1}^d \boldsymbol{y}_j}$. In every diffusion step, it first maps the data into an unit cube domain using reflection, then uses the forward transformation to map it within the projected simplex.

The results are shown in Figure \ref{fig:inception}.
We compare the generated distribution of the most likely classes.
The category index is in the same order of the pre-trained model's output.
The last plot in Figure \ref{fig:inception} compares the cumulative distribution of the ground truth and generated distribution, providing a cleaner view of the comparison.
The curve closely follows the diagonal in the CDF comparison, signifying a strong alignment between the true data distribution and the distribution derived from the generative model.
The result using diffeomorphic mapping is shown in Figure \ref{fig:inception}. 
By comparing the CDF comparison plots of two methods, the reflection based method outperforms the diffeomorphic based method, where the latter suffers from visible bias of the distribution due to the analytic blowups at edges/corners at edges/corners.


\begin{figure*}[H]
\centering
\includegraphics[width=0.25\textwidth]{figures/inception_true_distribution_12272023.pdf}
%
\includegraphics[width=0.25\textwidth]{figures/inception_stick_breaking_only_distribution_12272023.pdf}
%
\includegraphics[width=0.25\textwidth]{figures/inception_reflection_only_distribution_12272023.pdf}
%
\includegraphics[width=0.17\textwidth]{figures/cdf_ks_comparison.pdf}

\vskip -0.15in
\caption{Generations of high-dimensional projected simplex.
The results compare the reflection-based and stick-breaking based methods.
}
\label{fig:inception}
\end{figure*}

