Abstract: Deep learning has transformed fields such as computer vision, natural language processing, and audio analysis through its powerful pattern recognition and predictive capabilities. However, the robustness of these models remains a major concern, as they are highly vulnerable to adversarial attacks-subtle, intentional perturbations that lead to incorrect predictions. While recent defenses like adversarial training and defensive distillation aim to improve robustness, they have notable drawbacks, including overfitting and degraded performance under strong attacks. Certified defenses, such as robust training and Randomized Smoothing, offer theoretical guarantees within a specific perturbation radius, yet struggle to reflect real-world robustness due to efficiency bottlenecks and the unpredictable nature of actual adversarial attacks. These challenges reveal a critical gap between current defenses and real-world attack scenarios, highlighting the need for more practical and resilient solutions. To address the challenges of defense-attack gaps and the inefficiency in robust training, we introduce the Explanation-Guided Robust Training Enhancer (EGRTE). EGRTE combines a self-explaining mechanism, which guides adversarial training to focus on generalized features for improved robustness and accuracy, with a masking mechanism that transforms noised data for easier model learning. This approach not only mitigates noise effects, including adversarial perturbations, but also eliminates the need for time-intensive gradient calculations, greatly enhancing training efficiency. Comprehensive experiments on several datasets show EGRTE’s superior certified accuracy and robustness against adversarial attacks, with a 6.24-fold efficiency increase over comparable methods, positioning EGRTE as a highly effective solution for robust and efficient deep learning.
External IDs:dblp:journals/cybersec/LinHZLLW25
Loading