Abstract: Distributional overlap is a critical determinant of learnability in domain adaptation. The standard theory quantifies overlap in terms of $\chi^2$ divergence, as this factors directly into variance and generalization bounds agnostic to the functional form of the $Y$-$X$ relationship. However, in many modern settings, we cannot afford this agnosticism; we often wish to transfer across distributions with disjoint support, where these standard divergence measures are infinite. In this note, we argue that ``tailored'' divergences that are restricted to measuring overlap in a particular function class are more appropriate. We show how $\chi^2$ (and other) divergences can be generalized to this restricted function class setting via a variational representation, and use this to motivate balancing weight-based methods that have been proposed before, but, we believe, should be more widely used.
1 Reply
Loading