
\subsection{Standard DA}

Deep domain adaptation has been intensively studied and shown appealing
performance in various tasks and applications, notably in \citet{Ganin2015,long2015,saito2017asymmetric,french2017self}.
The core idea of DDA is to bridge the gap between source and target
distributions in a joint space by minimizing a divergence between
distributions induced from the source and target domains in this space.
Popular choices of divergence include Jensen-Shannon divergence \citep{Ganin2015,TzengHDS15,shu2018a};
maximum mean discrepancy distance \citep{gretton2007kernel,long2015};
and WS distance \citep{shen2018ws,chenyu2019swd,le2021labelshift}.
Some recent works have exploited different aspects of UDA for improving
the performance \citep{kurmi2019attending,Wang2019CDAN,pmlr-v97-chen19i,hu2020HGS}.
Typically, CADA \citep{kurmi2019attending} considered the probabilistic
certainty estimate of various regions and used these certainty estimate
weights for improving the classifier performance on the target dataset.
GSDA \citep{hu2020HGS} introduced a novel method named Hierarchical
Gradient Synchronization to model the synchronization relationship
among the local distribution pieces and global distribution, aiming
for more precise domain-invariant features.

\subsection{Optimal Transport based DA}

Optimal transport theory has been applied to domain adaptation in
\citet{courty2017optimal,courty2017joint,damodaran2018deepjdot,RedkoCFT19,chenyu2019swd,yujia2019onscalable,xu2020reliable}.
Particularly, \citet{chenyu2019swd} proposed using sliced Wasserstein
distance for domain adaption, whereas \citet{yujia2019onscalable}
proposed SPOT in which the optimal transport plan is approximated
by a pushforward of a reference distribution, and cast the optimal
transport problem into a minimax problem. Recent OT-based DA work
(RWOT) \citep{xu2020reliable} leveraged spatial prototypical information
and intra-domain structures of image data to reduce the negative transfer
caused by target samples near decision boundaries. Moreover, \citet{courty2017optimal}
proposed an idea to connect the theory of optimal transport and domain
adaptation, which later inspired an OT-based deep DA method (DeepJDOT)
\citep{damodaran2018deepjdot}. Another recent work (ETD) \citep{li2020enhanceOT}
tackled the bottlenecks of OT in UDA by developing an attention-aware
OT distance to measure the domain discrepancy under the guidance of
the prediction-feedback.\emph{ }Our proposed approach is totally different
from existing OT based DA approaches in which we examine an OT distance
discrete distribution over source class-conditional distributions
and the target data distribution. By investigating this specific OT
distance and minimizing it, we can guide target examples moving to
an appropriate source class on the latent space for mitigating both
data and label shifts.

\subsection{Class-aware DA}

Some recent approaches \citep{wang2019classaware,kang2019can} leverage
the useful information from the label space to improve the quality
of the alignment between the source and target domains. \citet{wang2019classaware}
proposed a novel relationship-aware adversarial domain adaptation
(RADA) algorithm. It first uses a single multi-class domain discriminator
to enforce the learning of inter-class dependency structure during
domain-adversarial training. After that, it aligns this structure
with the inter-class dependencies that are characterized from training
the label predictor on source domain. Furthermore, the authors imposed
a regularization term in order to penalize the structure discrepancy
between the inter-class dependencies estimated from domain discriminator
and label predictor. With this alignment, RADA makes the adversarial
domain adaptation aware of the class relationships. \citet{kang2019can}
proposed a contrastive adaptation network (CAN) which optimizes a
new metric modeling the intra-class domain discrepancy and the inter-class
domain discrepancy. In particular, the authors introduced a new contrastive
domain discrepancy (CDD) objective to enable class-aware UDA. CAN
aims to faciliate the optimization with CDD (established on maximum
mean discrepancy (MMD) \citep{long2015}).
