Con-Detect: Detecting adversarially perturbed natural language inputs to deep classifiers through holistic analysis
Abstract: Highlights•Adversarial inputs to language classifiers have a greater cumulative contribution score than clean inputs.•Con-Detect can detect adversarial inputs by analyzing their contribution scores at runtime.•Even with an adaptive adversary, Con-Detect increases the cost and decreases the stealth of the attack.
Loading