Abstract: A good garment try-on model should learn the transfer between different types of garments while satisfying: 1) high fidelity and 2) low inference speed. Existing methods address either of these two issues, limited processing speed or low generation quality. We directly use a lightweight encoder-decoder, ensuring faster speeds. To tackle the problem of lower image quality typically generated by lighter models, we present GarFast, a simplified, parser-free framework that optimizes the same lightweight network through a two-stage transformation of real data roles (from input to supervision), thereby greatly promoting model convergence. Specifically, first, we propose a correction strategy to prevent the difficulty of convergence caused by the lack of ground truth in the first stage. Second, we propose a fine-grained domain consistency to ensure that the results generated in the unsupervised first stage are highly realistic clothed human images. Finally, we propose a skin-variant refinement loss and a skinMix regularization to amplify texture differences and enhance the realism of skin-variant regions, thereby improving the quality of the generated skin. Extensive experiments thoroughly demonstrate that our method achieves high resolution, near real-time performance, and superior reconstruction quality compared to state-of-the-art approaches, with processing times of less than 0.03 seconds on an Nvidia A100.
Loading