CLESSR-VC: Contrastive learning enhanced self-supervised representations for one-shot voice conversion

Yuhang Xue, Ning Chen, Yixin Luo, Hongqing Zhu, Zhiying Zhu

Published: 2024, Last Modified: 11 Apr 2025Speech Commun. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•SSL features are adopted to ensure the model’s generalization.•Contrastive learning strategy enhances the representation ability of features.•Mel-spectrogram is introduced to compensate the speaker information in SSL features.•The proposed model has excellent performance in objective and subjective evaluations.