Abstract: Can a poisoned machine learning (ML) model passively recover from its adversarial manipulation by retraining with new samples, and regain non-poisoned performance? And if passive recovery is possible, how can it be quantified? From an adversarial perspective, is a small amount of poisoning sufficient to force the defender to retrain more over time?This paper proposes the evaluation of passive recovery from "availability data poisoning" using active learning in the context of Android malware detection. To quantify passive recovery, we propose two metrics: intercept to assess the speed of recovery, and recovery rate to quantify the stability of recovery. To investigate passive recovery, we conduct our experiments at different rates of active learning, in conjunction with varying strengths of availability data poisoning. We perform our evaluation on 259,230 applications from AndroZoo, using the Drebin feature representation, with linear SVM, DNN, and Random Forest as classifiers. Our findings show the convergence of the poisoned models to their respective hypothetical non-poisoned models. Therefore, demonstrating that through the use of active learning as a concept drift mitigation strategy, passive recovery is feasible across the three classifiers evaluated.
Loading