# RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction.

### About
This folder contains the implementation code for the RFWave model.

### TL;DR
RFWave, a frame-level multi-band Rectified Flow model, achieves high-fidelity audio waveform reconstruction from Mel-spectrograms or discrete tokens, with generation speeds up to 160 times faster than real-time on a GPU.

## Usage

### Setup
1. Install the requirements.
```
sudo apt-get update
sudo apt-get install sox libsox-fmt-all libsox-dev
conda create -n rfwave python=3.10
conda activate rfwave
pip install -r requirements.txt
```
2. Download and extract the [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/)
3. Update the wav paths in filelists `sed -i -- 's,LJSPEECH_PATH,ljs_dataset_folder,g' LJSpeech/*.filelist`
4. Update the `filelist_path` in configs/*.yaml.

### Vocoder
1. Train a vocoder `python3 train.py -c configs/rfwave.yaml`
2. Test a trained vocoder with `inference_voc.py`
### Encodec Decoder
1. Train an Encodec Decoder `python3 train.py -c configs/rfwave-encodec.yaml`
### Text to Speech
1. Download the [alignment](https://drive.google.com/file/d/1WfErAxKqMluQU3vupWS6VB6NdehXwCKM/view) from the [SyntaSpeech repo](https://github.com/yerfor/SyntaSpeech)
2. Convert the alignments and build a phoneset with `scripts/ljspeech_synta.py`
3. Modify the `filelist_path` and `phoneset` path in `configs/rfwave-dur.yaml` and `configs/rfwave-tts-ctx.yaml`
4. Train a duration model `python3 train.py -c configs/rfwave-dur.yaml`
5. Train an acoustic model `python3 train.py -c configs/rfwave-tts-ctx.yaml`
6. Test the trained model with `inference_tts.py`

## Thanks

This repository uses code from [Vocos](https://github.com/gemelo-ai/vocos), [audiocraft](https://github.com/facebookresearch/audiocraft) 

## License
This project is licensed under the MIT License.
