<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>MultiCrafter: High-Fidelity Multi-Subject Generation via Disentangled Attention and Identity-Aware Preference Alignment</title>
<link rel="icon" href="static/umo_files/teaser/umo_2.webp" type="image/png">

<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
      rel="stylesheet">
<link href="static/css/bulma.min.css" rel="stylesheet">
<link rel="stylesheet" href="static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="static/css/bulma-slider.min.css">
<link href="static/css/fontawesome.all.min.css" rel="stylesheet">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">

<link href="static/css/style.css" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=Chewy:regular" rel="stylesheet" />
<link href="https://fonts.googleapis.com/css?family=Kaushan+Script:regular" rel="stylesheet" />

</head>

<body>
<div class="column has-text-centered" style="background: #ffffff;width: 100%">
  <br>
  <br>
  <h1 class="title is-1 publication-title">
    <span style="font-size: 60px; color:#d95b0b; font-weight: bold; font-family: Kaushan Script,regular">
      <strong>MultiCrafter</strong></span><br>
    <span style="font-size: 40px; font-weight: bold; font-family: Chewy,regular; background: linear-gradient(to right, #007BFF, #8E44AD); -webkit-background-clip: text; color: transparent;">
      <strong>High-Fidelity Multi-Subject Generation via</strong></span><br>
        <span style="font-size: 40px; color:#363636; font-weight: bold; font-family: Chewy,regular; letter-spacing: 1.5px;">
          <strong>Disentangled Attention and Identity-Aware Preference Alignment</strong></span>

    <br><br>
</div>

<section class="hero is-small">
  <div class="hero-body" style="background: #e9e9e9; width: 100%; padding: 0;">
    <div class="carousel-teaser-container" id="container">
      <div class="carousel-teaser" id="carousel">
        <img src="static/multicrafter_files/teaser/1.png" alt="Image 1" class="teaser-img">
        <img src="static/multicrafter_files/teaser/2.png" alt="Image 2" class="teaser-img">
        <img src="static/multicrafter_files/teaser/3.png" alt="Image 3" class="teaser-img">

        <img src="static/multicrafter_files/teaser/1.png" alt="Image 1" class="teaser-img">
        <img src="static/multicrafter_files/teaser/2.png" alt="Image 2" class="teaser-img">
        <img src="static/multicrafter_files/teaser/3.png" alt="Image 3" class="teaser-img">

        <img src="static/multicrafter_files/teaser/1.png" alt="Image 1" class="teaser-img">
        <img src="static/multicrafter_files/teaser/2.png" alt="Image 2" class="teaser-img">
        <img src="static/multicrafter_files/teaser/3.png" alt="Image 3" class="teaser-img">

        <img src="static/multicrafter_files/teaser/1.png" alt="Image 1" class="teaser-img">
        <img src="static/multicrafter_files/teaser/2.png" alt="Image 2" class="teaser-img">
        <img src="static/multicrafter_files/teaser/3.png" alt="Image 3" class="teaser-img">
      </div>
    </div>
  </div>
  <div class="drag-instruction">
    ✨ Press and hold the mouse button to drag and view
  </div>
</section>

<div class="content_two">
  <h2 style="text-align:center; color:#363636;"><b>Abstract</b></h2>
  <br>
  <p>Multi-subject image generation aims to synthesize user-provided subjects in a single image while preserving subject fidelity, ensuring prompt consistency, and aligning with human aesthetic preferences. However, existing methods, particularly those built on the In-Context-Learning paradigm, are limited by their reliance on simple reconstruction-based objectives, leading to both severe attribute leakage that compromises subject fidelity and failing to align with nuanced human preferences. To address this, we propose MultiCrafter, a framework that ensures high-fidelity, preference-aligned generation. First, we find that the root cause of attribute leakage is a significant entanglement of attention between different subjects during the generation process. Therefore, we introduce explicit positional supervision to explicitly separate attention regions for each subject, effectively mitigating attribute leakage. To enable the model to accurately plan the attention region of different subjects in diverse scenarios, we employ a Mixture-of-Experts architecture to enhance the model's capacity, allowing different experts to focus on different scenarios. Finally, we design a novel online reinforcement learning framework to align the model with human preferences, featuring a scoring mechanism to accurately assess multi-subject fidelity and a more stable training strategy tailored for the MoE architecture. Experiments validate that our framework significantly improves subject fidelity while aligning with human preferences better.</p>
</div>

<br>

<div class="content_two">
  <h2 style="text-align:center; color:#363636;"><b>How does it work?</b></h2><br>
  <img class="summary-img" src="static/multicrafter_files/method/1.png" style="width:100%;">
<br>
  <p> Our framework is built on three core innovations: (Top Left) Identity-Disentangled Attention Regularization uses positional supervision to prevent attribute leakage; (Top Right) the MoE-LORA architecture boosts model capacity for diverse scenarios; and (Bottom) the Identity-Preserving Preference Alignment framework employs a novel online reinforcement learning strategy with a Multi-ID Alignment Reward and the stable GSPO algorithm to align the model with human preferences.</p>
</div>
<br>

<br>

<div class="content_two">
  <h2 style="text-align:center; color:#363636;"><b>Results of Multi-Human Personalization.</b></h2>

  <br>
  <img class="summary-img" src="static/multicrafter_files/comparison/1.png" style="width:100%;">
  <br>

  <br>
  <img class="summary-img" src="static/multicrafter_files/slider/2.png" style="width:100%;">
  <br>

  <br>
  <img class="summary-img" src="static/multicrafter_files/slider/3.png" style="width:100%;">
  <br>
</div>

<br>
<div class="content_two">
  <h2 style="text-align:center; color:#363636;"><b>Results of Multi-Object Personalization.</b></h2>
  <br>
  <img class="summary-img" src="static/multicrafter_files/slider/1.png" style="width:100%;">
  <br>
</div>

<br>
<div class="content_two">
  <h2 style="text-align:center; color:#363636;"><b>Results of Single-Subject Personalization.</b></h2>
  <br>
  <img class="summary-img" src="static/multicrafter_files/slider/4.png" style="width:100%;">
  <br>

  <br>
  <img class="summary-img" src="static/multicrafter_files/slider/5.png" style="width:100%;">
  <br>

  <br>
  <img class="summary-img" src="static/multicrafter_files/slider/6.png" style="width:100%;">
  <br>
</div>

<br>


<script>
window.addEventListener('load', function() {
  const container = document.getElementById('container');
  const carousel = document.getElementById('carousel');
  const images = document.querySelectorAll('.teaser-img');

  // Core state variables
  let isDragging = false;
  let startX = 0;
  let currentX = 0;
  let scrollSpeed = 1; // Scrolling speed
  let loopWidth = 0;   // Total width of one image set
  let isMouseOver = false;

  // Initialization
  function init() {
    if (images.length === 0) return;

    // Calculate width
    const firstImg = images[0];
    const imgWidth = firstImg.offsetWidth || 600;
    loopWidth = (imgWidth + 10) * 6;

    // Start scrolling
    requestAnimationFrame(autoScroll);
  }

  // Set position
  function setPosition(x) {
    currentX = x;
    carousel.style.transform = `translateX(${currentX}px)`;
  }

  // Auto scroll logic
  function autoScroll() {
    if (!isMouseOver && !isDragging && loopWidth > 0) {
      currentX -= scrollSpeed;

      if (currentX <= -loopWidth) {
        currentX += loopWidth;
      }

      setPosition(currentX);
    }
    requestAnimationFrame(autoScroll);
  }

  // Drag start
  carousel.addEventListener('mousedown', (e) => {
    e.preventDefault();
    isDragging = true;
    startX = e.clientX - currentX;
    carousel.classList.add('dragging');
    // Temporarily disable hover effects during dragging to prevent conflicts
    images.forEach(img => {
      img.style.pointerEvents = 'none';
    });
  });

  // Drag move
  document.addEventListener('mousemove', (e) => {
    if (!isDragging) return;
    const newX = e.clientX - startX;
    setPosition(newX);
  });

  // Drag end
  document.addEventListener('mouseup', () => {
    if (!isDragging) return;

    isDragging = false;
    carousel.classList.remove('dragging');
    // Restore hover effects
    images.forEach(img => {
      img.style.pointerEvents = 'auto';
    });

    // Boundary correction
    if (currentX > 0) {
      carousel.style.transition = 'transform 0.3s ease-out';
      setPosition(currentX - loopWidth);
      setTimeout(() => {
        carousel.style.transition = '';
      }, 300);
    } else if (currentX < -loopWidth * 2) {
      carousel.style.transition = 'transform 0.3s ease-out';
      setPosition(currentX + loopWidth);
      setTimeout(() => {
        carousel.style.transition = '';
      }, 300);
    }
  });

  // Hover control
  container.addEventListener('mouseenter', () => {
    isMouseOver = true;
  });

  container.addEventListener('mouseleave', () => {
    isMouseOver = false;
    isDragging = false;
    carousel.classList.remove('dragging');
    carousel.style.transition = '';
    // Ensure hover effects are restored when leaving
    images.forEach(img => {
      img.style.pointerEvents = 'auto';
    });
  });

  // Touch device support
  carousel.addEventListener('touchstart', (e) => {
    isDragging = true;
    startX = e.touches[0].clientX - currentX;
    carousel.classList.add('dragging');
  }, { passive: false });

  document.addEventListener('touchmove', (e) => {
    if (!isDragging) return;
    e.preventDefault();
    const newX = e.touches[0].clientX - startX;
    setPosition(newX);
  }, { passive: false });

  document.addEventListener('touchend', () => {
    if (!isDragging) return;

    isDragging = false;
    carousel.classList.remove('dragging');

    if (currentX > 0) {
      carousel.style.transition = 'transform 0.3s ease-out';
      setPosition(currentX - loopWidth);
      setTimeout(() => carousel.style.transition = '', 300);
    } else if (currentX < -loopWidth * 2) {
      carousel.style.transition = 'transform 0.3s ease-out';
      setPosition(currentX + loopWidth);
      setTimeout(() => carousel.style.transition = '', 300);
    }
  });

  // Initialize on load
  init();
});
</script>
</body>
</html>