Keywords: Generalization, Out-of-Distribution, Entropy-based Methods, Unsupervised Contrastive Learning, Latent Representations
Abstract: We study the relationship between the entropy of intermediate representations and a model's robustness to distributional shift. We train two feed-forward networks end-to-end separated by a discrete $n$-bit channel on an unsupervised contrastive learning task. Different \textit{masking strategies} are implemented that remove a proportion $p_{\text{mask}}$ of low-entropy bits, high-entropy bits, or random bits, and the effects on performance are compared to the baseline accuracy with no mask. When testing in-distribution (InD) we find that the removal of bits via any strategy leads to an \textit{increase} in performance, when masking out a relatively low $p_{\text{mask}}$. We hypothesize that the entropy of a bit serves as a guide to its usefulness out-of-distribution (OOD). Through experiment on three OOD datasets we demonstrate that the removal of low-entropy bits can notably benefit OOD performance. Conversely, we show that top-entropy masking disproportionately harms performance both InD and OOD.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
TL;DR: We hypothesize that low-entropy features tend to be more domain-specific. This paper studies how the entropy of the intermediate representation affect the model's robustness against out-of-distribution (OOD) data.
Supplementary Material: zip
6 Replies
Loading