Abstract: 3D virtual try-on based on a single image can provide an excellent shopping experience for Internet users and has enormous business potential. The existing methods of processing the clothed 3D human body generated from the virtual try-on images are reconstructed in 3D models by extracting the depth information from the input images. However, the generated results are unstable and often fail to capture the high-frequency information loss detail features in the larger spatial background during the process of downsampling for depth prediction, and the loss of the generator gradient when predicting the occluded areas in the high-resolution images. To address this problem, we propose a multi-resolution parallel approach to obtain low-frequency information and retain as much of the high-frequency depth features in the images during depth prediction; at the same time, we use a multi-scale generator and discriminator to more accurately infer the feature images of the occluded regions to generate a fine-grained dressed 3D human body. Our method not only provides better details and effects to the final 3D mannequin generation for 3D virtual fitting, but also significantly improves the user’s try-on experience than previous studies, as evidenced by our higher quantitative and qualitative evaluations.