Keywords: domain, generalization, resnet, machine, learning, neural, network, medical, xray, image, data
TL;DR: The results of the original paper were replicated with a few variances.
Abstract: Scope of Reproducibility We reproduced the results of the paper "Domain Generalization Using Causal Matching." Traditional supervised learning assumes that the classes/labels seen in testing must have appeared during the training phase. However, this assumption is often violated in real-world applications. For instance, in e-commerce, new categories of products are released every day. A model that cannot detect new/unseen classes is hard to function in such open environments as they are not generalizable. Methodology The open-source code of the paper has been used. The authors provided detailed instructions to reproduce the results on their GitHub page. We reproduced almost every table in the main text and few of them from the appendix. In case of a mismatch of the results, we also investigated the cause and proposed possible explanations for such behavior. For the extensions, we wrote extra functions to check the paper’s claim on other open-source standard datasets. We mainly used the infrastructure offered by the publicly available GPUs offered by Google Colab and GPU-assisted desktop computers to train the models. Results Most of our results closely match the reported results in the original paper for the Rotated-MNIST , Fashion-MNIST , PACS [18, 28], and Chest-Xray  datasets. However, in some cases, as described later, we obtained better results quantitatively than the ones reported in the paper. By investigating the root cause of such mismatches, we provide a possible reason to avoid such a gap. We performed additional experiments by making necessary modifications for the Rotated-MNIST and Rotated Fashion-MNIST dataset. In general, our results still support the main claim of the original paper, even though the results differ for some of the training/testing instances. What was easy The official GitHub page of the paper has the open-source code, which was beneficial as it was well organized into multiple files. Thus, it was easy to follow. The experiments described in the paper were done on widely-used standard open-source datasets. Therefore, implementing each experiment was relatively easy to do. Furthermore, since most of the parameters were reported in the scripts, we did not need much tuning in most experiments. What was difficult Though implementing each experiment is relatively simple, the numerosity of experiments was a hard task. In particular, each experiment in the original setting requires training a network for a significant number of iterations. Having limited access to computational resources and time, we sometimes changed the settings, sacrificing granularity. However, these changes did not affect the interpretability of the final results. Communication with original authors We emailed the authors and received prompt responses to our questions regarding the provided Jupyter reproduction notebooks. Some tables had multiple runs for the same technique, but it was unclear how to execute the alternative runs
Paper Url: https://arxiv.org/pdf/2006.07500.pdf
Paper Venue: ICML 2021