Abstract: Virtual fitting, in which a person’s image is changed to an arbitrary clothing image, is expected to be applied to shopping sites and videoconferencing. In real-time virtual fitting, image-based methods using a knowledge distillation technique can generate high-quality fitting images by inputting only the image of arbitrary clothing and a person without requiring the additional data like pose information. However, there are few studies that perform fast virtual fitting from arbitrary clothing images stably with real person images for situations such as videoconferencing considering temporal consistency. Therefore, the purpose of this demo is to perform robust virtual fitting with temporal consistency for videoconferencing. First, we created a virtual fitting system and verified how effective the existing fast image fitting method is for webcam video. The results showed that the existing methods do not adjust the dataset and do not consider temporal consistency, and thus are unstable for input images similar to videoconferencing. Therefore, we propose to train a model that adjusts the dataset to be similar to a videoconference and to add temporal consistency loss. Qualitative evaluation of the proposed model confirms that the model exhibits less flicker than the baseline. Figure 1 shows an example usage of our try-on system which is running on Zoom.
0 Replies
Loading