<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  
  <!-- Primary Meta Tags -->
  <meta name="title" content="AlignVid - Anonymous">
  <meta name="description" content="BRIEF_DESCRIPTION_OF_YOUR_RESEARCH_CONTRIBUTION_AND_FINDINGS">
  <meta name="keywords" content="KEYWORD1, KEYWORD2, KEYWORD3, machine learning, computer vision, AI">
  <meta name="author" content="Anonymous">
  <meta name="robots" content="index, follow">
  <meta name="language" content="English">
  
  <!-- Open Graph / Facebook -->
  <meta property="og:type" content="article">
  <meta property="og:site_name" content="INSTITUTION_OR_LAB_NAME">
  <meta property="og:title" content="AlignVid">
  <meta property="og:description" content="BRIEF_DESCRIPTION_OF_YOUR_RESEARCH_CONTRIBUTION_AND_FINDINGS">
  <meta property="og:url" content="https://YOUR_DOMAIN.com/YOUR_PROJECT_PAGE">
  <meta property="og:image" content="https://YOUR_DOMAIN.com/static/images/social_preview.png">
  <meta property="og:image:width" content="1200">
  <meta property="og:image:height" content="630">
  <meta property="og:image:alt" content="AlignVid - Research Preview">
  <meta property="article:published_time" content="2024-01-01T00:00:00.000Z">
  <meta property="article:author" content="Anonymous">
  <meta property="article:section" content="Research">
  <meta property="article:tag" content="KEYWORD1">
  <meta property="article:tag" content="KEYWORD2">

  <!-- Twitter -->
  <meta name="twitter:card" content="summary_large_image">
  <meta name="twitter:site" content="@YOUR_TWITTER_HANDLE">
  <meta name="twitter:creator" content="@anonymous">
  <meta name="twitter:title" content="AlignVid">
  <meta name="twitter:description" content="BRIEF_DESCRIPTION_OF_YOUR_RESEARCH_CONTRIBUTION_AND_FINDINGS">
  <meta name="twitter:image" content="https://YOUR_DOMAIN.com/static/images/social_preview.png">
  <meta name="twitter:image:alt" content="AlignVid - Research Preview">

  <!-- Academic/Research Specific -->
  <meta name="citation_title" content="AlignVid">
  <meta name="citation_author" content="Anonymous">
  <meta name="citation_publication_date" content="2024">
  <meta name="citation_conference_title" content="Under Review">
  <meta name="citation_pdf_url" content="https://YOUR_DOMAIN.com/static/pdfs/paper.pdf">
  
  <!-- Additional SEO -->
  <meta name="theme-color" content="#2563eb">
  <meta name="msapplication-TileColor" content="#2563eb">
  <meta name="apple-mobile-web-app-capable" content="yes">
  <meta name="apple-mobile-web-app-status-bar-style" content="default">
  
  <!-- Preconnect for performance -->
  <link rel="preconnect" href="https://fonts.googleapis.com">
  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
  <link rel="preconnect" href="https://ajax.googleapis.com">
  <link rel="preconnect" href="https://documentcloud.adobe.com">
  <link rel="preconnect" href="https://cdn.jsdelivr.net">


  <title>AlignVid - Anonymous | Academic Research</title>
  
  <!-- Favicon and App Icons -->
  <link rel="icon" type="image/x-icon" href="static/images/favicon.ico">
  <link rel="apple-touch-icon" href="static/images/favicon.ico">
  
  <!-- Critical CSS - Load synchronously -->
  <link rel="stylesheet" href="static/css/bulma.min.css">
  <link rel="stylesheet" href="static/css/index.css">
  
  <!-- Non-critical CSS - Load asynchronously -->
  <link rel="preload" href="static/css/bulma-carousel.min.css" as="style" onload="this.onload=null;this.rel='stylesheet'">
  <link rel="preload" href="static/css/bulma-slider.min.css" as="style" onload="this.onload=null;this.rel='stylesheet'">
  <link rel="preload" href="static/css/fontawesome.all.min.css" as="style" onload="this.onload=null;this.rel='stylesheet'">
  <link rel="preload" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css" as="style" onload="this.onload=null;this.rel='stylesheet'">
  
  <!-- Fallback for browsers that don't support preload -->
  <noscript>
    <link rel="stylesheet" href="static/css/bulma-carousel.min.css">
    <link rel="stylesheet" href="static/css/bulma-slider.min.css">
    <link rel="stylesheet" href="static/css/fontawesome.all.min.css">
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
  </noscript>
  
  <!-- Fonts - Optimized loading -->
  <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700;800&display=swap" rel="stylesheet">
  
  <!-- Defer non-critical JavaScript -->
  <script defer src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
  <script defer src="https://documentcloud.adobe.com/view-sdk/main.js"></script>
  <script defer src="static/js/fontawesome.all.min.js"></script>
  <script defer src="static/js/bulma-carousel.min.js"></script>
  <script defer src="static/js/bulma-slider.min.js"></script>
  <script defer src="static/js/index.js"></script>
  <script defer src="static/js/demos.js"></script>
  
  <!-- Structured Data for Academic Papers -->
  <script type="application/ld+json">
  {
    "@context": "https://schema.org",
    "@type": "ScholarlyArticle",
    "headline": "AlignVid",
    "description": "BRIEF_DESCRIPTION_OF_YOUR_RESEARCH_CONTRIBUTION_AND_FINDINGS",
    "author": [
      {
        "@type": "Person",
        "name": "Anonymous",
        "affiliation": {
          "@type": "Organization",
          "name": ""
        }
      }
    ],
    "datePublished": "2024-01-01",
    "publisher": {
      "@type": "Organization",
      "name": "Under Review"
    },
    "url": "https://YOUR_DOMAIN.com/YOUR_PROJECT_PAGE",
    "image": "https://YOUR_DOMAIN.com/static/images/social_preview.png",
    "keywords": ["KEYWORD1", "KEYWORD2", "KEYWORD3", "machine learning", "computer vision"],
    "abstract": "FULL_ABSTRACT_TEXT_HERE",
    "citation": "BIBTEX_CITATION_HERE",
    "isAccessibleForFree": true,
    "license": "https://creativecommons.org/licenses/by/4.0/",
    "mainEntity": {
      "@type": "WebPage",
      "@id": "https://YOUR_DOMAIN.com/YOUR_PROJECT_PAGE"
    },
    "about": [
      {"@type": "Thing", "name": "RESEARCH_AREA_1"},
      {"@type": "Thing", "name": "RESEARCH_AREA_2"}
    ]
  }
  </script>
  
  <!-- Website/Organization Structured Data -->
  <script type="application/ld+json">
  {
    "@context": "https://schema.org",
    "@type": "Organization",
    "name": "",
    "url": "",
    "logo": "https://YOUR_DOMAIN.com/static/images/favicon.ico",
    "sameAs": []
  }
  </script>
</head>
<body>


  <!-- Scroll to Top Button -->
  <button class="scroll-to-top" onclick="scrollToTop()" title="Scroll to top" aria-label="Scroll to top">
    <i class="fas fa-chevron-up"></i>
  </button>

  <main id="main-content">
  <section class="hero">
    <div class="hero-body">
      <div class="container is-max-desktop">
        <div class="columns is-centered">
          <div class="column has-text-centered">
            <h1 class="title is-2 publication-title">AlignVid: Training-Free Attention Scaling for Semantic Fidelity in Text-Guided Image-to-Video Generation</h1>
            <div class="is-size-4 publication-authors">
              <span class="author-block">Anonymous</span>
            </div>

            <div class="is-size-4 publication-authors">
              <span class="author-block">ICLR Under Review</span>
            </div>

            <p class="is-size-6 has-text-grey">The method is deployed in commercial models and accessible to users as an additional option.<br>We will release the code and benchmark.</p>
          </div>
        </div>
      </div>
    </div>
  </div>
</section>

<!-- Paper abstract -->
<section class="section hero is-light">
  <div class="container is-max-desktop">
    <div class="columns is-centered has-text-centered">
      <div class="column is-four-fifths">
        <h2 class="title is-3">Abstract</h2>
        <div class="content has-text-justified">
          <p>
            Text-guided image-to-video (TI2V) generation has recently achieved remarkable progress, particularly in maintaining subject consistency and temporal coherence. However, existing methods still struggle to adhere to fine-grained prompt semantics, especially when prompts entail substantial transformations of the input image (e.g., object addition, deletion, or modification), a shortcoming we term semantic negligence. In a pilot study, we find that applying a Gaussian blur to the input image improves semantic adherence. Analyzing attention maps, we observe clearer foreground–background separation. From an energy perspective, this corresponds to a lower-entropy attention distribution. Motivated by this, we introduce AlignVid, a training-free framework with two components: (i) Attention Scaling Modulation (ASM), which directly reweights attention via lightweight Q/K scaling, and (ii) Guidance Scheduling (GS), which applies ASM selectively across transformer blocks and denoising steps to reduce visual quality degradation. This minimal intervention improves prompt adherence while limiting aesthetic degradation. In addition, we introduce OmitI2V to evaluate semantic negligence in TI2V generation, comprising 367 human-annotated samples that span addition, deletion, and modification scenarios. Extensive experiments demonstrate that AlignVid can enhance semantic fidelity. Code and benchmark will be released.
          </p>
        </div>
      </div>
    </div>
  </div>
</section>
<!-- End paper abstract -->

<!-- Pseudocode (Image) -->
<section class="section">
  <div class="container is-max-desktop">
    <div class="columns is-centered has-text-centered">
      <div class="column is-four-fifths">
        <h2 class="title is-3">Pseudocode</h2>
        <figure>
          <img src="static/images/p_code.png?v=1" alt="Pseudocode" loading="lazy" onerror="this.onerror=null; this.src='/static/images/p_code.png?v=1';"/>
        </figure>
      </div>
    </div>
  </div>
</section>
<!-- End Pseudocode (Image) -->


<!-- Video Comparisons -->
<section class="section">
  <div class="container is-max-desktop">
    <h2 class="title is-3 has-text-centered">Video Comparisons</h2>
    <div id="video-comparisons"></div>
  </div>
</section>
<!-- End Video Comparisons -->

</body>
</html>
