BlingDiff: High-Fidelity Virtual Jewelry Try-On with Detail-Optimized Diffusion

Yunfang Niu, Lingxiang Wu, Dong Yi, Lu Zhou, Jinqiao Wang

Published: 27 Oct 2025, Last Modified: 09 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: The current virtual try-on technology is predominantly applied to garments, while virtual jewelry try-on represents a more fine-grained and challenging task. This technology requires accurate replication of jewelry details, positioning, and dimensions while maintaining the surrounding skin's natural texture consistency. To meet these requirements, we propose BlingDiff, a novel and flexible framework for virtual jewelry try-on that supports various types of jewelry with adjustable position and size. Firstly, we introduce a dual-model aggregation inpainting approach to obtain clean and natural agnostic images. Then, we propose an adaptive virtual jewelry try-on method based on the parallel diffusion transformer that seamlessly integrates images of persons and jewelry to synthesize high-fidelity results. Optionally, positional bounding boxes can be incorporated to offer enhanced spatial guidance. Complementing this method, we introduce a detail-oriented attention (DOA) module that fuses features from the person and jewelry, effectively steering the model's focus toward jewelry-specific regions. Moreover, a shape-aware loss (SAL) is employed to refine the accuracy of the synthesized jewelry's texture and form, while a perturbed guidance technique (PGT) further boosts the efficiency of both training and inference. Finally, we conduct experiments on the proposed dataset. The results indicate that our method outperforms baseline models in both qualitative and quantitative evaluations.
Loading