Abstract: Highlights•Combining contrastive learning, attribute masking, and clustering.•Depthwise CNN for feature extraction and iTransformer for attribute-weighted fusion.•Random attribute masking in pre-training refines features, avoids attribute selection.•Achieving interpretability analysis based on attention mechanism.•Ablation and comparative studies validate components importance and result accuracy.
Loading