Robust Detection of Directional Adversarial Attacks in Deep Neural Networks for Radiological Imaging

Robust Detection of Directional Adversarial Attacks in Deep Neural Networks for Radiological Imaging

ICLR 2026 Conference Submission20286 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deep Learning, Medical Imaging, Adversarial Attacks, Diagnostic AI, Image Classification, Directional Attack Detection, Neural Networks, Chest X-rays, CT Scans, MRI, Adversarial Noise, Attack Detection Framework, Sigmoid Function Analysis, Medical Image Security, Healthcare AI, Deep Neural Networks (DNNs), Robustness in AI Diagnostics, Defense Mechanisms, Clinical AI Safety, Adversarial Robustness

Abstract: Deep learning is now central to radiology, helping detect changes on X-rays, CTs, and MRIs. However, these systems are highly vulnerable to adversarial attacks - small, crafted perturbations that mislead models while appearing unchanged to humans. Such errors risk missed cancers or false positives. Studies show over 90\% attack success on medical scans. Almost 30\% of US hospitals use AI in imaging in 2023. Therefore, an effective method of detecting such attacks and their directionality is crucial to ensure the credibility and reliability of medical systems based on DNNs. We propose a novel detection framework that leverages subsequent attacks using random noise to identify adversarial perturbations. Our approach uses the analysis of variations in the Sigmoid function to distinguish genuine medical images from adversarially manipulated ones. Specifically, we compare prediction differences between clean and attacked images, as well as between different adversarially altered versions, to improve detection accuracy. We evaluated our method on three popular medical datasets including 'Chest X-Ray Images for Classification', 'Retinal Fundus Multi-disease Image Dataset' and 'Brain Tumor Dataset' under various attack scenarios, including random noise. Our framework achieved up to 99.8\% ACC in identifying adversarial directional attacks, significantly outperforming existing defense mechanisms in terms of detection accuracy. It consistently identified adversarial samples across varying attack strengths while keeping false positives low, showing strong reliability and potential for clinical use. Furthermore, our proposal highlights its potential for real-world deployment.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 20286

Loading