Abstract: Deep learning-based methods have made significant progress in natural image matting. However, mainstream approaches like CNN and ViT mainly focus on capturing the global features of images, but they lack specialized treatment for edges. They struggle to accurately distinguish between foregrounds and backgrounds that have similar colors and textures, which results in blurred edge areas between the foreground and the background. To solve the above problems, we propose Edge-reserved Knowledge Distillation Model (ERKD), which can reserving good edge features while distill multimodal semantic information from large models. In order to acquire multi-scale edge features, we design the Edge-Reserved Module and the Multimodal Feature Fusion Module. At the same time, to enhance the capture of edge features, we introduce CLIP for feature-level knowledge distillation operations. The comprehensive evaluation on Composition-1k and Distinction-646 datasets shows that the performance of this method surpasses existing techniques.
Loading