Abstract: Image-based virtual try-on aims to fit an in-shop garment into a reference person image. To achieve this, a key step is garment warping, which aligns the target garment with the corresponding parts of the reference person and warps it reasonably. Previous methods typically adopt unweighted appearance flow estimation, which inherently makes it difficult to learn meaningful positions and generates unrealistic warping when the reference and the target have a large spatial difference. To overcome this limitation, a novel weighted appearance flow estimation strategy is proposed in this work. First, we extract the fusion latent vector of the reference and the target via Dual Branch Bottleneck Transformer. This enables us to take advantage of a latent vector to encode the global context. Then, we enhance the realism of appearance flow by performing sparse spatial sampling. This strengthens the communication of local information and applies constraints to warping. Experiment results on a popular virtual try-on benchmark show that our method outperforms the current state-of-the-art method in both quantitative and qualitative evaluations.
Loading