Taming Latent Diffusion Model for Black-Box Model Inversion Attacks

Zetao Lin; Xinhao Liu; Qiao Yan

Taming Latent Diffusion Model for Black-Box Model Inversion Attacks

Zetao Lin, Xinhao Liu, Qiao Yan

16 Sept 2025 (modified: 10 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Model Inversion Attacks, Latent Diffusion Models, Adapter Network

TL;DR: MI-ControlNet: A novel model inversion attack framework by fine-tuning latent diffusion models (LDMs), integrating probability-guided attention and surrogate-based identity loss to achieve superior black-box performance without full LDM retraining.

Abstract: Model inversion attacks (MIAs) seriously threaten privacy by querying models to generate synthetic images that expose features of private training data. These attacks target deep neural networks used in sensitive fields like user authentication. Previous MIA methods based on generative adversarial networks have been proven to be effective, and there are preliminary studies exploring the potential of diffusion models in this field. In this work, we further investigated the feasibility of applying large-scale text to-image latent diffusion models (LDMs) to MIA. To overcome challenges of high training costs from scratch and the complexity of conditional encoding, we propose a novel method that fine-tunes an LDM via an adapter network. Our MI-ControlNet framework integrates probability distributions into the attention module to guide latent space generation. We also train a surrogate model through knowledge transfer and integrated identity loss into the training process of MI-ControlNet, thus establishing a new training paradigm. Experiments on various models and datasets have shown that our method achieves excellent performance in black-box MIAs. Code will be made available following the review process.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 8097

Loading