Implicit Preference Alignment for Human Image Animation

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Human image animation has witnessed significant advancements, yet generating high-fidelity hand motions remains a persistent challenge due to their high degrees of freedom and motion complexity. While reinforcement learning from human feedback, particularly direct preference optimization, offers a potential solution, it necessitates the construction of strict preference pairs. However, curating such pairs for dynamic hand regions is prohibitively expensive and often impractical due to frame-wise inconsistencies. In this paper, we propose Implicit Preference Alignment (IPA), a data-efficient post-training framework that eliminates the need for paired preference data. Theoretically grounded in implicit reward maximization, IPA aligns the model by maximizing the likelihood of self-generated high-quality samples while penalizing deviations from the pretrained prior. Furthermore, we introduce a Hand-Aware Local Optimization mechanism to explicitly steer the alignment process toward hand regions. Experiments demonstrate that our method achieves effective preference optimization to enhance hand generation quality, while significantly lowering the barrier for constructing preference data. Codes are released at https://github.com/mdswyz/IPA
Lay Summary: Artificial intelligence has become quite impressive at animating images of people, but generating realistic and natural hand movements remains a major challenge because hands are incredibly complex and flexible. Typically, teaching an AI to improve at this task involves showing it many side-by-side examples of "good" versus "bad" hand animations so it can learn human preferences. However, manually creating these comparison examples for moving hands is exceptionally difficult, time-consuming, and expensive. To solve this, we developed a new training method that entirely removes the need for these costly side-by-side comparisons. Instead, our approach encourages the AI model to independently recognize and reproduce its own highest-quality animations, while keeping it from losing its baseline knowledge. We also designed a specific mechanism that forces the AI to focus its learning directly on the hand areas. Our results show that this method successfully creates much better, more realistic hand animations while significantly reducing the cost and effort required to train the AI.
Originally Submitted Supplementary Material: zip
Link To Code: https://github.com/mdswyz/IPA
Primary Area: Applications->Computer Vision
Keywords: Human Image Animation, Implicit Preference Alignment, Hand-Aware Local Optimization
Originally Submitted PDF: pdf
Submission Number: 5256
Loading