For diffusion models, machine unlearning is crucial for mitigating the intellectual property and ethical challenges aris- ing from unauthorized style replication. However, most existing unlearning methods struggle to completely remove styles while preserving generation quality, as their erasure mechanisms rely on the noise distribution where style and content are intrinsically entangled. To address it, we propose Style Unlearning in Diffusion Models (SUDM), a novel framework based on hybrid-attention distillation, where cross-attention provides style-agnostic supervision to self-attention for targeted style erasure. By leveraging the structural distinctions within attention component, SUDM enables more accurate modeling of style compared to previous work. Additionally, we introduce query consistency and parameter consistency to ensure content preservation and robust generalization. Extensive experiments and user studies on Stable Diffusion demon- strate that SUDM achieves more thorough style erasure with minimal quality degradation, outperforming existing unlearning methods in both visual fidelity and precision.
The core idea of SUDM is to construct a hybrid attention distillation objective that targets the selective removal of style-specific patterns while preserving content fidelity. This is achieved by measuring the discrepancy between the model’s original self-attention output and a cross-attention output, which reuses the original query but replaces the key and value matrices with those obtained from a style-neutral reference image. Our framework can be illustrated in Figure 2.
Formally, let $Q_{l}^{t}, K_{l}^{t}, V_{l}^{t}$ denote the query,key,and value matrix at layer $l$ and timestep $t$ during the inference for the stylized prompt $P$. Similarly, let $K_{l}^{ref,t}, V_{l}^{ref,t}, Q_{l}^{ref,t}$ denote the corresponding query, key, value matrix for the reference image $I^{ref}$ that shares the same content but exhibits a different, neutral style. We define the hybrid attention distillation loss at each selected layer and timestep as: $$ \mathcal{L}_{\mathrm{HAD}}=\left\|\mathrm{Attn}(Q,K,V)-\mathrm{Attn}(Q,K^{\mathrm{ref}},V^{\mathrm{ref}})\right\|$$ To preserve the semantic content after unlearning, we intorduce the content-preserving loss: $$ \mathcal{L}_{\mathrm{content}}=\left\|Q-Q^{\mathrm{ref}}\right\|. $$
To maintain the over generalization capabilities, we apply a retain loss: $$ \mathcal{L}_{\mathrm{retain}}=\left\|\theta-\theta_{\mathrm{ori}}\right\|, $$ where $\theta_{ori}$ denote the original parameters of the model.
The total loss is defined as: $$\mathcal{L}_{\mathrm{total}}=\mathcal{L}_{\mathrm{HAD}}+\lambda_{1}\mathcal{L}_{\mathrm{content}}+\lambda_{2}\mathcal{L}_{\mathrm{retain}},$$ where $\lambda_1$ and $\lambda_2$ are hyperparameters.
We conduct all experiments using the publicly availabe Stable Diffusion v1.5 model(CompVis 2022) as our backbone.We compare our method with four different latest approaches including ESD-x, Forget-Me-Not, UCEand SPM. To evaluate artistic style unlearning, we focus on four widely adopted and visually distinct artistic styles: Vincent Van Gogh, Claude Monet, Pablo Picasso, and Rembrandt. For each experiment, we erase a single artist style, the remaining styles are used to evaluate whether the model preserves its generation capacity for unrelated artistic styles. We assess the unlearning performance using two standard metrics: CLIP Score (CS) and Fréchet Inception Distance (FID). The details are as follows
As illustrate in figure3, our method exhibits better performance than other methods.
We conduct ablation studies to assess each loss’s contribution in SUDM for unlearning Van Gogh style while preserving Monet by removing the HAD loss , conetnt-preservation loss and retain loss. As shown in Table 3, remov-ing the HAD loss significantly reduces unlearning performance, with higher CLIP similarity to Van Gogh, indicating that the model fails to effectively unlearn the target style. In in Fig5, given the prompt "A serene landscape with a bright yellow sun, reminiscent of Van Gogh's time in Arles", the model without $\mathcal{L}_{\mathrm{content}}$ fails to generate the key object (the sun), producing semantically incomplete results. In contrast, including $\mathcal{L}_{\mathrm{content}}$ preserves the intended content faithfully, confirming that query alignment is crucial for maintaining structural fidelity.Removing the retain loss causes the model fails to preserve the monet style.