Abstract: Visible-Infrared Person Re-identification (VI-ReID)
is critical for round-the-clock surveillance systems yet is hindered
by significant modality discrepancies. Existing methods often
fail to fully exploit frequency domain information, focusing
predominantly on spatial domain feature learning or limited
frequency decompositions. To address this, we propose the
Multi-Frequency Embedding Network (MFENet), a feature-level
method that operates in the frequency domain through multi-
frequency decomposition to learn discriminative and modality-
invariant features. Specifically, the HiLo-Frequency Modulation
(HiLo-FM) module efficiently extracts low-frequency features via
frequency-domain filtering and high-frequency details through
lightweight multiscale convolutions, followed by attention-based
fusion. The Frequency-Aware Diversity Enhancer (FADE) mod-
ule further enriches feature discriminability by weighting multi-
frequency components and learning diverse features through
multi-branch architectures. To further enhance the performance
of our method, we introduce two innovative loss functions. The
Cross-Modality Soft Retrieval (CMSR) loss prioritizes cross-
modality consistency over intra-modality similarity, while the
Cross-Modality Ranking Regularization (CMRR) loss enhances
feature diversity through differentiable rank correlation opti-
mization. Extensive experiments demonstrate the state-of-the-
art performance of our method, achieving 61.06% Rank-1 and
67.75% mAP in the challenging IR to VIS mode on the largest VI-
ReID benchmark LLCM, surpassing existing methods by signifi-
cant margins without resorting to reranking or additional labeled
data.
Loading