DiffProbe: Towards a Universal and Cross-Modality Adversarial Robustness Quantification Framework for Black-box Classifiers using Diffusion Models

TMLR Paper6734 Authors

01 Dec 2025 (modified: 08 Dec 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Neural network classifiers have become popular, fundamentally transforming tasks across multiple data modalities such as image processing, natural language understanding, audio recognition, and more. Despite their widespread adoption, a critical challenge that persists is ensuring their robustness against adversarial attacks, which aim to deceive models through subtly modified inputs. This issue is particularly acute when considering interactions across different modalities, a facet that most current studies neglect. Addressing this gap, our paper introduces \textbf{DiffProbe}, the first unified black-box framework for adversarial robustness quantification using synthetically generated data from domain-specific diffusion models. \textbf{DiffProbe} stands out by seamlessly integrating off-the-shelf diffusion models to create a versatile and comprehensive framework tool suitable for a wide range of data types and adversarial scenarios. It is particularly designed for computational efficiency, making it ideal for evaluating black-box models and facilitating remote auditing with minimal requirement—only needing predictions from models on synthetic data. The robustness evaluation of \textbf{DiffProbe} is both theoretically sound and empirically robust, showing high consistency with real-world adversarial attack methods. We have extensively tested \textbf{DiffProbe} across various state-of-the-art classifiers and black-box APIs in domains including facial recognition, text, audio, video, and point cloud data. The results underscore its effectiveness in providing realistic and actionable insights into the adversarial robustness of systems, thus enhancing our understanding of adversarial vulnerabilities and aiding in the development of more secure AI systems across different modalities.
Submission Type: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=RInB0xVxyy&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: Change the fonts
Assigned Action Editor: ~Haoliang_Li2
Submission Number: 6734
Loading