Grouping-By-ID: Guarding Against Adversarial Domain Shifts

Christina Heinze-Deml, Nicolai Meinshausen

Feb 15, 2018 (modified: Feb 15, 2018) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: When training a deep neural network for supervised image classification, one can broadly distinguish between two types of latent features of images that will drive the classification of class Y. Following the notation of Gong et al. (2016), we can divide features broadly into the classes of (i) “core” or “conditionally invariant” features X^ci whose distribution P(X^ci | Y) does not change substantially across domains and (ii) “style” or “orthogonal” features X^orth whose distribution P(X^orth | Y) can change substantially across domains. These latter orthogonal features would generally include features such as position, rotation, image quality or brightness but also more complex ones like hair color or posture for images of persons. We try to guard against future adversarial domain shifts by ideally just using the “conditionally invariant” features for classification. In contrast to previous work, we assume that the domain itself is not observed and hence a latent variable. We can hence not directly see the distributional change of features across different domains. We do assume, however, that we can sometimes observe a so-called identifier or ID variable. We might know, for example, that two images show the same person, with ID referring to the identity of the person. In data augmentation, we generate several images from the same original image, with ID referring to the relevant original image. The method requires only a small fraction of images to have an ID variable. We provide a causal framework for the problem by adding the ID variable to the model of Gong et al. (2016). However, we are interested in settings where we cannot observe the domain directly and we treat domain as a latent variable. If two or more samples share the same class and identifier, (Y, ID)=(y,i), then we treat those samples as counterfactuals under different style interventions on the orthogonal or style features. Using this grouping-by-ID approach, we regularize the network to provide near constant output across samples that share the same ID by penalizing with an appropriate graph Laplacian. This is shown to substantially improve performance in settings where domains change in terms of image quality, brightness, color changes, and more complex changes such as changes in movement and posture. We show links to questions of interpretability, fairness and transfer learning.
  • TL;DR: We propose counterfactual regularization to guard against adversarial domain shifts arising through shifts in the distribution of latent "style features" of images.
  • Keywords: supervised representation learning, causality, interpretability, transfer learning