Abstract: Few-shot semantic segmentation is a challenging task that aims to segment novel classes in the query images given only a few annotated support samples. Most existing prototype-based approaches extract global or local prototypes by global average pooling (GAP) or clustering to represent all object information. Subsequently, the prototype information is employed as guidance for query image segmentation. However, these frameworks fail to fully mine the object details and ignore information from query images. Consequently, we propose a Dual-Guided Frequency Prototype Network (DGFPNet) to solve these issues. Specifically, to mine the global and local object information, a Frequency Prototype Generation Module (FPGM) is first proposed to extract more comprehensive frequency prototypes by multi-frequency pooling (MFP) in the DCT domain. Then, with the guidance of support and query information, a Dual-Guided Selection Module (DGSM) is presented to produce the query attention mask and select more effective prototypes. Based on the query attention mask and support information, the generalized object information is integrated into the feature with the proposed Feature Generalization Module (FGM). Finally, we propose a Multi-Dimension Feature Enrichment Decoder Module (MDFEDM) to capture multi-dimension object information and tackle hard pixels for refining the final segmentation results. Extensive experiments on PASCAL-${\mathbf{5}}^{{\bm {i}}}$ and COCO-${\mathbf{20}}^{{\bm {i}}}$ show that our model achieves new state-of-the-art performances.
External IDs:dblp:journals/tmm/WenHMYZ24
Loading