ReIDMamba: Learning Discriminative Features with Visual State Space Model for Person Re-Identification
Abstract: Extracting robust discriminative features is a critical
challenge in person re-identification (ReID). While Transformer-
based methods have successfully addressed some limitations
of convolutional neural networks (CNNs), such as their local
processing nature and information loss resulting from convolution
and downsampling operations, they still face the scalability issue
due to the quadratic increase in memory and computational
requirements with the length of the input sequence. To overcome
this, we propose a pure Mamba-based person ReID framework
named ReIDMamba. Specifically, we have designed a Mamba-
based strong baseline that effectively leverages fine-grained, dis-
criminative global features by introducing multiple class tokens.
To further enhance robust features learning within Mamba, we
have carefully designed two novel techniques. First, the multi-
granularity feature extractor (MGFE) module, designed with
a multi-branch architecture and class token fusion, effectively
forms multi-granularity features, enhancing both discrimina-
tion ability and fine-grained coverage. Second, the ranking-
aware triplet regularization (RATR) is introduced to reduce
redundancy in features from multiple branches, enhancing the
diversity of multi-granularity features by incorporating both
intra-class and inter-class diversity constraints, thus ensuring
the robustness of person features. To our knowledge, this is
the pioneering work that integrates a purely Mamba-driven
approach into ReID research. Our proposed ReIDMamba model
boasts only one-third the parameters of TransReID, along
with lower GPU memory usage and faster inference through-
put. Experimental results demonstrate ReIDMamba’s superior
and promising performance, achieving state-of-the-art perfor-
mance on five person ReID benchmarks.
Loading