Abstract: When deploying machine learning models in real-world applications, an open-world learning framework is needed to deal with both normal in-distribution inputs and undesired out-of-distribution (OOD) inputs. Open-world learning frameworks include OOD detectors that aim to discard input examples which are not from the same distribution as the training data of machine learning classifiers. However, our understanding of current OOD detectors is limited to the setting of benign OOD data, and an open question is whether they are robust in the presence of adversaries. In this paper, we present the first analysis of the robustness of open-world learning frameworks in the presence of adversaries by introducing and designing øodAdvExamples. Our experimental results show that current OOD detectors can be easily evaded by slightly perturbing benign OOD inputs, revealing a severe limitation of current open-world learning frameworks. Furthermore, we find that øodAdvExamples also pose a strong threat to adversarial training based defense methods in spite of their effectiveness against in-distribution adversarial attacks. To counteract these threats and ensure the trustworthy detection of OOD inputs, we outline a preliminary design for a robust open-world machine learning framework.
0 Replies
Loading