# # Speech-T

## Quick Start

1. Prepare dataset
    ```bash
    mkdir -p data/raw/
    cd data/raw/
    wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
    tar -zxf LJSpeech-1.1.tar.bz2
    cd ../../
    python datasets/tts/lj/prepare.py
    ```
2. Forced alignment
    ```bash
    # Download MFA first: https://montreal-forced-aligner.readthedocs.io/en/stable/aligning.html
    # unzip to montreal-forced-aligner
    ./montreal-forced-aligner/bin/mfa_train_and_align data/raw/LJSpeech-1.1/mfa_input data/raw/LJSpeech-1.1/dict_mfa.txt data/raw/LJSpeech-1.1/mfa_outputs -t ./montreal-forced-aligner/tmp -j 24
    ```

3. Download pre-trained vocoder
    ```
    mkdir wavegan_pretrained
    ```
    download `checkpoint-1000000steps.pkl`, `config.yaml`, `stats.h5` from https://drive.google.com/open?id=1XRn3s_wzPF2fdfGshLwuvNHrbgD0hqVS to `wavegan_pretrained/`

4. Build binary data

    ```bash
    # speech_transducer
    PYTHONPATH=. python datasets/tts/lj/gen_speech_transducer.py --config configs/tts/lj/speech_transducer.yaml
    ```

5. Train
    ```bash
    CUDA_VISIBLE_DEVICES=0 PYTHONPATH=. python tasks/speech_transducer.py --config configs/tts/lj/speech_transducer.yaml --exp_name speech_transducer_exp1 --reset
    ```
   
6. Inference
    ```bash
    CUDA_VISIBLE_DEVICES=0 PYTHONPATH=. python tasks/speech_transducer.py --config configs/tts/lj/speech_transducer.yaml --exp_name speech_transducer_exp1 --infer
    ```
