\section{Proposed Methods}\label{proposed}




We propose two novel methods. The first method uses ${\sf BagCSI}$ loss as the objective. We have shown above that ${\sf BagCSI}$ loss is an upper bound over $\eps(\mc{D}_T, h)$ loss w.r.t target distribution. We now provide intuitive explanation for why ${\sf BagCSI}$ loss should work.

Let us assume that the goal is to predict label for an unseen instance $\bx$, given feature representations $\phi(\bx_i)$ in the embedding space and corresponding labels $y_i$ from training data.
A natural prediction would be $\E_i \left[\rho(\phi(\bx), \phi(\bx_i))y_i\right]$, where $\rho$ is some similarity metric.
If we choose the similarity metric to be the inner product, the prediction can be written as $\phi(\bx)^{\sf T}\E_i \left[\phi(\bx_i)y_i\right]$.
The given feature representations and corresponding labels can come either from the source domain or from the target domain. 
For learning domain invariant feature representation, the prediction should be similar irrespective of the domain considered. 
This can be achieved by enforcing the term, $\sum\limits_i y_i\phi(\bx_i)$ to be equal for source and target domain. However, this approach requires knowledge of instance-level labels $y_\bx$ from target domain, which are not available. We can however replace $y_\bx$ with \emph{pseudo-labels} $\hat{y}_\bx$, using which we introduce a new domain adaptation loss term in the objective, $\psi^2(\mc{S}, \mc{B})$ where: 
\begin{align}
 \psi(\mc{S}, \mc{B}) & := \frac{1}{mk}\left\|\sum_{j=1}^m \sum_{\bx \in B_j}\hat{y}_\bx\phi(\bx) - \sum_{i=1}^{mk}y_i\phi(\bz_i) \right\|_2 \label{eq:domain_adaptation_loss}
\end{align}
One way is to assign the bag-label as the pseudo-label for all instances withing the bag, in which case $\psi(\mc{S}, \mc{B})$ essentially reduces to $\xi(\mc{S}, \mc{B})$. %
We call this method \textit{Bag Label Weighted Feature Alignment} (\textbfne{BL-WFA}) which involves training using the ${\sf BagCSI}$ loss.

Another approach is to use the following process for pseudo-labeling instances in a bag $B$ using hypothesis model $h$:
\begin{enumerate}[nolistsep,noitemsep,leftmargin=*]
    \item Compute the predictions $\{h(\bx)\}_{\bx\in B}$.
    \item The pseudo-labels are given by adding to each prediction the same $b \in \R$ such that average pseudo-label in the bag equals the bag-label. Note that this  is equivalent to  the nearest  vector of pseudo-labels  (in Euclidean distance) to the vector predictions, that satisfies the bag-label constraint.
\end{enumerate}
We call this method \textit{Pseudo-label Weighted Feature Alignment} (\textbfne{PL-WFA}) in which  $\psi(\mc{S}, \mc{B})$ is used to train the model using the above computed pseudo-labels.





