<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <script id="MathJax-script" type="text/javascript" async
          src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML">
    </script>
    <title>SUDM</title>
    <script type="text/x-mathjax-config">
        MathJax.Hub.Config({
            tex2jax: {inlineMath: [['$', '$'], ['\\(', '\\)']]},
            "HTML-CSS": { availableFonts: ["TeX"] },
            tex: { 
                extensions: ["AMSmath.js", "AMSsymbols.js", "boldsymbol.js", "mhchem.js"]
            }
        });
    </script>
  
    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css">
    <style>
        body {
            font-family: 'Georgia', serif;
            line-height: 1.6;
            background-color: #f5f5f5;
            color: #333;
        }
        img {
            max-width: 100%;
            max-height: 100%;
            height: auto;
            width: auto;
        }

        .roman-numeral {
        font-family: 'Times New Roman', Times, serif; /* 使用罗马字体 */
        font-size: 16px; /* 设置字体大小 */
        }
        .figure-container {
            display: flex;
            justify-content: space-around;
            align-items: center; /* Center align items vertically */
            margin: 20px 0;
        }
        header {
            background-color: white;
            padding: 30px 0;
            color: white;
            text-align: center;
        }
        h1 {
            color: black;
            font-size: 33px;
            line-height: 1.2;
            margin-bottom: 20px;
        }
        p {
            color: #555;
            font-size: 16px;
            margin-bottom: 10px;
        }
        section {
            margin: 40px 0;
            padding: 20px;
            background-color: white;
            box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
            border-radius: 8px;
        }
        h2 {
            color: #007BFF;
            border-bottom: 2px solid #007BFF;
            padding-bottom: 5px;
            margin-bottom: 20px;
        }
        /* h3 {
            color: #ea7909;
            font-size: 20px;
            margin-bottom: 20px;
        }  */
        ul {
            list-style-type: square;
            margin-left: 20px;
        }
        code {
            font-family: 'Courier New', Courier, monospace;
            background-color: #f9f9f9;
            padding: 10px;
            border: 1px solid #ddd;
            border-radius: 3px;
            display: block;
            overflow-x: auto;
            margin-bottom: 20px;
        }
        figure {
            margin: 20px 0;
        }
        figcaption {
            color: #777;
            font-style: italic;
            text-align: center; /* 设置对齐方式为居中 */
    /* 或者使用其他值，如 left（左对齐）、right（右对齐） */
            font-size: 16px;
        }
        footer {
            background-color: white;
            padding: 20px 0;
            color: white;
            text-align: center;
        }
    </style>
</head>
<body>
    <header>
        <h1 style="white-space: pre-line;">
           Style Unlearning in Diffusion Models
        </h1>
        <div class="author">
            Anonymous Author
        </div>
    </header>

    <div class="container">
        <section id="abstract">
            <h2>1. Abstract</h2>
            <!-- <span style="color: red;">red</span>  -->
            <p> For diffusion models, machine unlearning is crucial for mitigating the intellectual property and ethical challenges aris-
ing from unauthorized style replication. However, most existing unlearning methods struggle to completely remove styles
while preserving generation quality, as their erasure mechanisms rely on the noise distribution where style and content
are intrinsically entangled. To address it, we propose Style
Unlearning in Diffusion Models (SUDM), a novel framework
based on hybrid-attention distillation, where cross-attention
provides style-agnostic supervision to self-attention for targeted style erasure. By leveraging the structural distinctions
within attention component, SUDM enables more accurate
modeling of style compared to previous work. Additionally,
we introduce query consistency and parameter consistency to
ensure content preservation and robust generalization. Extensive experiments and user studies on Stable Diffusion demon-
strate that SUDM achieves more thorough style erasure with
minimal quality degradation, outperforming existing unlearning methods in both visual fidelity and precision. 
                </p>
        </section>

        <section id="problem">
            <h2>2. Problem Defination</h2>
            <ul>
                <li> </span>Stable Diffusion (CompVis 2022) and other large-scale text-to-image models can mimic styles of specific artists, particularly when using prompts such as “art in the style of [artist]”, as illustrated in Figure 1.</li>
                <div class="img" style="text-align:center">
                    <img class="img_responsive" src="first3.png" alt="tasked class" style="margin:auto;max-width:100%">
                    <figcaption>(a): Original Van Gogh artworks. (b): Images generated by Stable Diffusion models that mimic Van Gogh's style. (c)  Results after applying SUDM to remove Van Gogh's style from (b). This comparison illustrates the model's capability to reproduce the visual characteristics of a specific artist as well as the effectiveness of SUDM in style unlearning.</figcaption>
                </div>
                <br>
                <li> </span> To address unauthorized replication of artistic styles in diffusion models, it is critical to “unlearn” specific styles or visual patterns embedded in pre-trained diffusion models.</span>
                </li>
            </ul>
            
        </section>

        <section id="framework">
            <h2>3.Illustration of Our Framework</h2>
            <p>
                 The core idea of SUDM is to construct a hybrid attention distillation objective that targets the selective removal of style-specific patterns while preserving content fidelity. This is achieved by measuring the discrepancy between the model’s original self-attention output and a cross-attention output, which reuses the original query but replaces the key and value matrices with those obtained from a style-neutral reference image. Our framework can be illustrated in Figure 2.
            </p>
            <div class="img" style="text-align:center">
                <img class="img_responsive" src="method1.png" alt="framework" style="margin:auto;max-width:90%">
                <figcaption>Figure 2: The framework of SUDM. Given a stylized prompt and a style-neutral reference image with shared content, HAD extracts the self-attention response and applies distillation to remove style-specific patterns while preserving semantic structure.</figcaption>
            </div>
            <p> 
                Formally, let $Q_{l}^{t}, K_{l}^{t}, V_{l}^{t}$ denote the query,key,and value matrix at layer $l$ and timestep $t$ during the inference for the stylized prompt $P$. Similarly, let $K_{l}^{ref,t}, V_{l}^{ref,t}, Q_{l}^{ref,t}$ denote the corresponding query, key, value matrix for the reference image $I^{ref}$ that shares the same content but exhibits a different, neutral style.  We define the hybrid attention distillation loss at each selected layer and timestep as:
                  $$ \mathcal{L}_{\mathrm{HAD}}=\left\|\mathrm{Attn}(Q,K,V)-\mathrm{Attn}(Q,K^{\mathrm{ref}},V^{\mathrm{ref}})\right\|$$
                To preserve the semantic content after unlearning, we intorduce the content-preserving loss:
                $$ \mathcal{L}_{\mathrm{content}}=\left\|Q-Q^{\mathrm{ref}}\right\|. $$
            </p>
            <p>
               To maintain the over generalization capabilities, we apply a retain loss:
               $$ \mathcal{L}_{\mathrm{retain}}=\left\|\theta-\theta_{\mathrm{ori}}\right\|, $$
               where $\theta_{ori}$ denote the original parameters of the  model.
            </p>
            <p>
                The total loss is defined as:
$$\mathcal{L}_{\mathrm{total}}=\mathcal{L}_{\mathrm{HAD}}+\lambda_{1}\mathcal{L}_{\mathrm{content}}+\lambda_{2}\mathcal{L}_{\mathrm{retain}},$$
where $\lambda_1$ and $\lambda_2$ are hyperparameters.
            </p>
        </section>

        <section id="Theory">
            <h2>4. Experiments</h2>
            <p> We conduct all experiments using the publicly availabe Stable Diffusion v1.5 model(CompVis 2022) as our backbone.We compare our  method with four different latest  approaches including ESD-x, Forget-Me-Not, UCEand SPM. To evaluate artistic style unlearning, we focus on four widely adopted and visually distinct artistic styles: Vincent Van Gogh, Claude Monet, Pablo Picasso, and Rembrandt. For each experiment, we erase a single artist style, the remaining styles are used to evaluate whether the model preserves its generation capacity for unrelated artistic styles. We assess the unlearning performance using two standard metrics: CLIP Score (CS) and Fréchet Inception Distance (FID). The details  are as follows </p>
            <ul>
                <div class="img" style="text-align:center">
                        <img class="img_responsive" src="monet.jpg" alt="dataset1" style="margin:auto;max-width:100%">
                        <figcaption>Table 1:Quantitative Evaluation of Unlearning Monet. The top-performing results are highlighted in bold, and the second-best results are underlined. </figcaption>
                        <!-- The count of categories in labeled and unlabeled data is indicated as ``Categories''. The terms ``Num.'' indicate the count of labeled instances. For unlabeled instances, ``T'' and ``U'' represent the amount of instances from target and unknown categories, respectively, under varying mismatch proportions. -->
                </div>
                <div class="img" style="text-align:center">
                        <img class="img_responsive" src="van.jpg" alt="dataset1" style="margin:auto;max-width:100%">
                        <figcaption>Table 2:Quantitative Evaluation of Unlearning Vangogh. The top-performing results are highlighted in bold, and the second-best results are underlined. </figcaption>
                        <!-- The count of categories in labeled and unlabeled data is indicated as ``Categories''. The terms ``Num.'' indicate the count of labeled instances. For unlabeled instances, ``T'' and ``U'' represent the amount of instances from target and unknown categories, respectively, under varying mismatch proportions. -->
                </div>
            </ul> 
            <p> As illustrate in figure3, our method exhibits better performance than other methods. </p>
            
            <ul>   
               <div class="img" style="text-align:left">
                        <img class="img_responsive" src="new.png" alt="dataset1" style="margin:auto;max-width:90%">
                        <figcaption>Figure 3:Visualization of artistic style unlearning across different target styles. Top: Van Gogh, Middle: Monet, Bottom:Picasso and Rembrandt. Our method effectively removes the target style while preserving content structure, outperforming existing unlearning
approaches. </figcaption>
                        <!-- The count of categories in labeled and unlabeled data is indicated as ``Categories''. The terms ``Num.'' indicate the count of labeled instances. For unlabeled instances, ``T'' and ``U'' represent the amount of instances from target and unknown categories, respectively, under varying mismatch proportions. -->
                </div>
            
            </ul>
            <span style="color: black;">Ablation Studies. 
                    <p>
                        We conduct ablation studies to assess each loss’s contribution in SUDM for unlearning Van Gogh style while preserving Monet by removing the HAD loss , conetnt-preservation loss and retain loss. As shown in Table 3, remov-ing the HAD loss significantly reduces unlearning performance, with higher CLIP similarity to Van Gogh, indicating that the model fails to effectively unlearn the target style. In in Fig5, given the prompt "A serene landscape with a bright yellow sun, reminiscent of Van Gogh's time in Arles", the model without $\mathcal{L}_{\mathrm{content}}$ fails to generate the key object (the sun), producing semantically incomplete results. In contrast, including $\mathcal{L}_{\mathrm{content}}$ preserves the intended content faithfully, confirming that query alignment is crucial for maintaining structural fidelity.Removing　the retain loss causes the model fails to preserve the monet style.
                    </p>
                    <div class="figure-container">
                        <figure>
                            <img class="img_ablation_query" src="ablate.jpg" style="margin:center;max-width:90%">
                            <figcaption>Table 3:  Effect of HAD , content-preservation loss and retain loss </figcaption>
                        </figure>
                    </div>
                   
                    
                    <figure style="text-align: center;">
                            <img class="img_ablation_attention" src="Ablate.png" style="margin:auto;max-width:100%">
                            <figcaption>figure 5: Visual illustration of component effects in SUDM. The top row shows Van Gogh style erasure, and the bottom row shows Monet style preservation. From left to right: (1) Original artwork. (2) full model (Van Gogh erased, Monet retained); (3) w/o $\mathcal{L}_{\text{HAD}}$ (Van Gogh not fully erased); (4) w/o $\mathcal{L}_{\text{content}}$ (content distorted, e.g., sun missing); (5) w/o $\mathcal{L}_{\text{retain}}$ (Monet style not preserved).</figcaption>
                    </figure>
                    
            
        </section>

        <section id="controbution">
            <h2>5. Contribution</h2>
            <ul>
                <li>
                   We propose a novel framework for style unlearning in diffusion models by leveraging hybrid attention distillation, query consistency, and parameter consistency techniques, named SUDM. To the best of our knowledge, it is the first unlearning technique tailored for style removal
from diffusion models.
                </li>
                
                <li>
                   Innovatively, SUDM aligns stylized and style-neutral representations, to facilitate 
targeted style removal 
while preserving semantic content. Unlike the alignment in noise space, the alignment in representation space is capable to capture non-interfering embedded style and content, thereby benefiting for more precise removal of style.
                </li>
        
                <li>
                   Extensive experiments depict that our method can effectively unlearn the style while preserving the
generation performance of other concepts.
                </li>
               
            </ul>
        </section>

        <!-- <section id="code">
            <h2>6. Code</h2>
            <p>Please access our code through an anonymous link: <a href="https://anonymous.4open.science/r/PCL-B98C" target="_blank">PCL.</a>
            </p>
        </section> -->
    </div>

    <script src="https://code.jquery.com/jquery-3.5.1.slim.min.js"></script>
    <script src="https://cdn.jsdelivr.net/npm/@popperjs/core@2.10.2/dist/umd/popper.min.js"></script>
    <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/js/bootstrap.min.js"></script>

</body>
</html>
