In what follows, we present the background of OT for two discrete
distributions, which is used in our work. Consider two discrete distributions:
$\mathbb{P}^{1}=\sum_{i=1}^{M}\pi_{i}^{1}\delta_{\bx_{i}^{1}}$ and
$\mathbb{P}^{2}=\sum_{j=1}^{N}\pi_{j}^{2}\delta_{\bx_{j}^{2}}$ where
$\bpi^{1}=\left[\pi_{i}^{1}\right]_{i=1}^{M}$ and $\bpi^{2}=\left[\pi_{j}^{2}\right]_{j=1}^{N}$
are probability masses, $\left\{ \bx_{i}^{1}\right\} _{i=1}^{M}$
and $\left\{ \bx_{j}^{2}\right\} _{j=1}^{N}$ are the sets of atoms,
and $\delta_{\bx}$ is the Dirac delta distribution concentrated at
the atom $\bx$. Let $c\left(\bx_{i}^{1},\bx_{j}^{2}\right)$ be a
cost function. The OT distance between $\mathbb{P}^{1}$ and $\mathbb{P}^{2}$
w.r.t. the cost function $c$ is defined as
\begin{equation}
\min_{A\in\mathbb{R}_{+}^{M\times N}}\sum_{i=1}^{M}\sum_{j=1}^{N}a_{ij}c\left(\bx_{i}^{1},\bx_{j}^{2}\right),\label{eq:total_cost}
\end{equation}
where $A=\left[a_{ij}\right]\in\mathbb{R}_{+}^{M\times N}$ of non-negative
elements satisfying $\sum_{j=1}^{N}a_{ij}=\pi_{i}^{1},\forall i\in\left\{ 1,...,M\right\} $
and $\sum_{i=1}^{M}a_{ij}=\pi_{j}^{2},\forall j\in\left\{ 1,...,N\right\} $. 

In addition, $a_{ij}\in\left[0;1\right]$ is interpreted as the probability
to match $\bx_{i}^{1}$ and $\bx_{j}^{2}$ or to transport $\bx_{i}^{1}$
to $\bx_{j}^{2}$, which suffers the cost $c\left(\bx_{i}^{1},\bx_{j}^{2}\right)$.
Therefore, the sum $\sum_{i=1}^{M}\sum_{j=1}^{N}a_{ij}c\left(\bx_{i}^{1},\bx_{j}^{2}\right)$
can be viewed as the total cost to match $\mathbb{P}^{1}$ and $\mathbb{P}^{2}$
or to transport $\mathbb{P}^{1}$ to $\mathbb{P}^{2}$. By solving
the optimization problem in Eq. (\ref{eq:total_cost}), we aim to
find the optimal transportation matrix $A^{*}$ which minimizes the
total cost.
