Abstract: Highlights•We introduce a Cascaded Cross-modal Alignment framework for VI-ReID.•We design a Channel-Spatial Recombination strategy to reduce discrepancies in inputs.•We propose a frequency-level Low Frequency Masking module to enhance global details.•We propose a Prototype-based Semantic Refinement module for fine-grained refinement.•Comprehensive experiments demonstrate the effectiveness of the proposed methods.
Loading