With False Friends Like These, Who Can Have Self-Knowledge?

Lue Tao; Songcan Chen

With False Friends Like These, Who Can Have Self-Knowledge?

Lue Tao, Songcan Chen

28 Sept 2020 (modified: 22 Oct 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Robustness, Adversarial Risk, Neural Networks, Machine Learning Security

Abstract: Adversarial examples arise from excessive sensitivity of a model. Commonly studied adversarial examples are malicious inputs, crafted by an adversary from correctly classified examples, to induce misclassification. This paper studies an intriguing, yet far overlooked consequence of the excessive sensitivity, that is, a misclassified example can be easily perturbed to help the model to produce correct output. Such perturbed examples look harmless, but actually can be maliciously utilized by a false friend to make the model self-satisfied. Thus we name them hypocritical examples. With false friends like these, a poorly performed model could behave like a state-of-the-art one. Once a deployer trusts the hypocritical performance and uses the "well-performed" model in real-world applications, potential security concerns appear even in benign environments. In this paper, we formalize the hypocritical risk for the first time and propose a defense method specialized for hypocritical examples by minimizing the tradeoff between natural risk and an upper bound of hypocritical risk. Moreover, our theoretical analysis reveals connections between adversarial risk and hypocritical risk. Extensive experiments verify the theoretical results and the effectiveness of our proposed methods.

One-sentence Summary: Model performance could be hypocritically improved by false friends: we formalize this new realistic risk and analyze its relation with natural risk and adversarial risk.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/arxiv:2012.14738/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=QKvfJRN2T

14 Replies

Loading