DNAF: Diffusion with Noise-Aware Feature for Pose-Guided Person Image Synthesis

Liyan Guo, Kaiyu Song, Mengying Xu, Hanjiang Lai

Published: 2024, Last Modified: 13 Nov 2024ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Pose-guided person image synthesis aims at generating images based on the related pose skeleton and the appearance of a source image. As a popular generative model, the diffusion model shows its potential. However, there are two gaps to hinder the fusion between pose information and appearance: 1) Directly injecting pixel-level pose information into semantic features leads to the representation gap. 2) The timestep-dependent nature of the diffusion model introduces the noise-induced gap. To alleviate these, we propose Diffusion with Noise-Aware Feature(DNAF). Concretely, we leverage the T2I-Adapter-based pose adapter to achieve the mapping from the pixel level to the feature level. Then, we propose a lightweight trainable layer to infuse the multi-scale constant feature adaptively. In the end, we construct noise-aware features to more effectively guide the diffusion process. Experimental results show that DNAF achieves competitive results on DeepFashion and Market-1501 datasets.