Discovering Multi-Frequency Embedding for Visible-Infrared Person Re-identification

Hongyang Gu, Xiaogang yang, Lu Ruitao, Lei Pu, Siming Han, Ming Wu

Published: 21 Sept 2025, Last Modified: 11 Nov 2025IEEE Transactions on Circuits and Systems for Video TechnologyEveryoneRevisionsCC BY-NC-ND 4.0

Abstract: Visible-Infrared Person Re-identification (VI-ReID) is critical for round-the-clock surveillance systems yet is hindered by significant modality discrepancies. Existing methods often fail to fully exploit frequency domain information, focusing predominantly on spatial domain feature learning or limited frequency decompositions. To address this, we propose the Multi-Frequency Embedding Network (MFENet), a feature-level method that operates in the frequency domain through multi- frequency decomposition to learn discriminative and modality- invariant features. Specifically, the HiLo-Frequency Modulation (HiLo-FM) module efficiently extracts low-frequency features via frequency-domain filtering and high-frequency details through lightweight multiscale convolutions, followed by attention-based fusion. The Frequency-Aware Diversity Enhancer (FADE) mod- ule further enriches feature discriminability by weighting multi- frequency components and learning diverse features through multi-branch architectures. To further enhance the performance of our method, we introduce two innovative loss functions. The Cross-Modality Soft Retrieval (CMSR) loss prioritizes cross- modality consistency over intra-modality similarity, while the Cross-Modality Ranking Regularization (CMRR) loss enhances feature diversity through differentiable rank correlation opti- mization. Extensive experiments demonstrate the state-of-the- art performance of our method, achieving 61.06% Rank-1 and 67.75% mAP in the challenging IR to VIS mode on the largest VI- ReID benchmark LLCM, surpassing existing methods by signifi- cant margins without resorting to reranking or additional labeled data.