Abstract: Weakly-supervised semantic segmentation (WSSS) has emerged in recent years due to its appealing requirements for training data, i.e., with only image-level labels available as supervision. Most existing WSSS methods exploit the class activation maps (CAMs) as the seeds and generate the pseudo-pixel-level ground truth to train a segmentation network. In this work, we introduce a causal inference framework to ameliorate the quality of CAMs, conducing to the performance raise of existing WSSS algorithms that rely on CAMs. Our motivation is to deconfound a set of class-specific latent \textit{confounders} in a dataset, which are the potential cause of low-quality and poorly-localized CAMs. Due to the unobservable nature of the confounders, we present the utilization of \textit{front-door adjustment} for causal intervention to deconfound a classification neural network, without presuming and estimating the confounders explicitly. Our proposed algorithm, Causal CAM ($c^2am$), outperformed the prior causal framework for WSSS by a large margin, \underline{without} any additional parameters, network architecture modification, or manipulation of images, and only needs to add \underline{one} more line of code in a standard classifier training loop. Furthermore, we provide an optimization interpretation of the front-door adjustment for training a classifier to explain the improvements by \ccam. We evaluated $c^2am$ on PASCAL VOC 2012 and achieved mIoU 69.6\% of pseudo-mask generation on the training set, and mIoU 67.5\% and 67.7\% on validation and test set after training DeepLabV2 on the pseudo-masks. Our implementation and model weights for reproducibility are released at With Held.
4 Replies
Loading