Abstract: Few-shot segmentation (FSS) aims to segment objects in a
given query image with only a few labelled support images. The limited
support information makes it an extremely challenging task. Most previous best-performing methods adopt prototypical learning or affinity
learning. Nevertheless, they either neglect to further utilize support pixels
for facilitating segmentation and lose spatial information, or are not
robust to noisy pixels and computationally expensive. In this work, we
propose a novel end-to-end adaptive agent transformer (AAFormer) to
integrate prototypical and affinity learning to exploit the complementarity
between them via a transformer encoder-decoder architecture, including a representation encoder, an agent learning decoder and an agent
matching decoder. The proposed AAFormer enjoys several merits. First,
to learn agent tokens well without any explicit supervision, and to make
agent tokens capable of dividing different objects into diverse parts in
an adaptive manner, we customize the agent learning decoder according
to the three characteristics of context awareness, spatial awareness and
diversity. Second, the proposed agent matching decoder is responsible
for decomposing the direct pixel-level matching matrix into two more
computationally-friendly matrices to suppress the noisy pixels. Extensive
experimental results on two standard benchmarks demonstrate that our
AAFormer performs favorably against state-of-the-art FSS methods.
Loading