Keywords: domain adaptation, imputation, missing data, advertising
TL;DR: We propose a way to jointly tackle unsupervised domain adaptation and non-stochastic missing data in a target domain using distant supervision from a complete source domain.
Abstract: Motivated by practical applications, we consider unsupervised domain adaptation for classification problems, in the presence of missing data in the target domain. More precisely, we focus on the case where there is a domain shift between source and target domains, while some components of the target data are systematically absent. We propose a way to impute non-stochastic missing data for a classification task by leveraging supervision from a complete source domain through domain adaptation. We introduce a single model performing joint domain adaptation, imputation and classification which is shown to perform well under various representative divergence families (H-divergence, Optimal Transport). We perform experiments on two families of datasets: a classical digit classification benchmark commonly used in domain adaptation papers and real world digital advertising datasets, on which we evaluate our model’s classification performance in an unsupervised setting. We analyze its behavior showing the benefit of explicitly imputing non-stochastic missing data jointly with domain adaptation.
Code: https://www.dropbox.com/sh/8gszx52xfu0gdgz/AAAYx4H3aqA88DtvuC2k9h4za?dl=0
Original Pdf: pdf
9 Replies
Loading