ReIDMamba: Learning Discriminative Features with Visual State Space Model for Person Re-Identification

Published: 17 Oct 2025, Last Modified: 11 Nov 2025IEEE Transactions on MultimediaEveryoneCC BY-NC-ND 4.0
Abstract: Extracting robust discriminative features is a critical challenge in person re-identification (ReID). While Transformer- based methods have successfully addressed some limitations of convolutional neural networks (CNNs), such as their local processing nature and information loss resulting from convolution and downsampling operations, they still face the scalability issue due to the quadratic increase in memory and computational requirements with the length of the input sequence. To overcome this, we propose a pure Mamba-based person ReID framework named ReIDMamba. Specifically, we have designed a Mamba- based strong baseline that effectively leverages fine-grained, dis- criminative global features by introducing multiple class tokens. To further enhance robust features learning within Mamba, we have carefully designed two novel techniques. First, the multi- granularity feature extractor (MGFE) module, designed with a multi-branch architecture and class token fusion, effectively forms multi-granularity features, enhancing both discrimina- tion ability and fine-grained coverage. Second, the ranking- aware triplet regularization (RATR) is introduced to reduce redundancy in features from multiple branches, enhancing the diversity of multi-granularity features by incorporating both intra-class and inter-class diversity constraints, thus ensuring the robustness of person features. To our knowledge, this is the pioneering work that integrates a purely Mamba-driven approach into ReID research. Our proposed ReIDMamba model boasts only one-third the parameters of TransReID, along with lower GPU memory usage and faster inference through- put. Experimental results demonstrate ReIDMamba’s superior and promising performance, achieving state-of-the-art perfor- mance on five person ReID benchmarks.
Loading