## ClsVC: Learning Speech Representations with two different classification tasks

## 
ClsVC is a any-to-any non-parallel voice conversion framework. 

If you find this work useful and use it in your research, please consider citing our paper.




### Audio Demo

Due to company regulations, you can find our audio demo when this paper is accepted

### Dependencies
- Python 3
- Numpy
- PyTorch >= v0.4.1
- TensorFlow >= v1.3 (only for tensorboard)
- librosa
- tqdm
- wavenet_vocoder ```pip install wavenet_vocoder```
  for more information, please refer to https://github.com/r9y9/wavenet_vocoder

### Pre-trained models



### 1.Mel-Spectrograms to waveform

Download pre-trained WaveNet Vocoder model, and run ```python vocoder_synthesis.py``` .

Please note the training metadata and testing metadata have different formats.


### 2.Train model

We have included a small set of training audio files in the wav folder. However, the data is very small and is for code verification purpose only. Please prepare your own dataset for training.

1.Resample wav to 16000: ```python resample_wav.py```

2.Generate spectrogram data from the wav files: ```python make_spect.py```

3.Run the main training script: ```python main.py```

Converges when the reconstruction loss is around 0.001.


### 3.Inference 

Use pretrained ClsVC model to achieve VC task, please run ```python mel_conversion.py``` 

