# Bridge-TTS


## Installation

Firstly, install all Python package requirements:

```bash
pip install -r requirements.txt
```

Secondly, build `monotonic_align` code (Cython):

```bash
cd model/monotonic_align; python setup.py build_ext --inplace; cd ../..
```

**Note**: code is tested on Python==3.6.9.

## Dataset and Checkpoint Preparation
Download [LJSpeech-1.1](https://keithito.com/LJ-Speech-Dataset/) and modify the path in `./resources/ljspeech`.

Download the HiFi-GAN checkpoints provided by Grad-TTS [here](https://drive.google.com/drive/folders/1grsfccJbmEuSBGQExQKr3cVxNV0xEOZ7?usp=sharing).

and put all checkpoints in `./checkpts`.

## Training and Inference
See `./scripts`


## References

* This project is built on Grad-TTS, official github repository: [link](https://github.com/huawei-noah/Speech-Backbones/tree/main/Grad-TTS)
* HiFi-GAN model is used as vocoder, official github repository: [link](https://github.com/jik876/hifi-gan).
* Monotonic Alignment Search algorithm is used for unsupervised duration modelling, official github repository: [link](https://github.com/jaywalnut310/glow-tts).
* Phonemization utilizes CMUdict, official github repository: [link](https://github.com/cmusphinx/cmudict).
