# PADS-TAL: Padding-Annealed Diffusion Sampling in Text-Aware Latent Space for Robust and Diverse Text-to-Music Generation
This is for generating structured music with proposed TAL and PADS.

Features of this architecture is the following
- **Stable Music Generation**
    - Full song with 47 sec
    - Structured song form
    - Good Quality 
- **Robust and Diverse**
    - 15 Genres : Metal, Rack, HipHop, RnB, Blues, Jazz, Elec, Pop, Latin, World, Folk, Country, Classic, New Age, Easy
    - Style as you wants

## Tested Environment
**A100 GPU : Ubuntu(22.04), CUDA(11.8.0), CuDNN(8.9.0)**

**H100 GPU : Ubuntu(22.04), CUDA(11.8.0), CuDNN(8.9.0)**

**H100 GPU : Ubuntu(22.04), CUDA(12.4.1), CuDNN(9.1.0)**

## Required Install

1. **Install with apt-get**
```bash
apt-get update -y && apt-get install -y vim git feh tmux tzdata python3 python3-pip build-essential tree wget curl libgl1-mesa-glx libglib2.0-0 unzip pdsh openssh-server net-tools htop libaio-dev ffmpeg
```


2. **Install with pip3**
```bash
pip3 install --upgrade pip
pip3 install torch==2.7.1 torchaudio==2.7.1 torchvision==0.22.1
pip3 install -r requirements.txt
git clone https://github.com/Stability-AI/stable-audio-metrics.git pads_tal/tools/stable-audio-metrics
```


3. **Install optimization packages (Optional)**
```bash
pip3 install flash-attn==2.5.9.post1 deepspeed
```


## How to use 

1. **Setting Guide**
- Change configuration in **config/..**

2. **How to train**

**Train TAL-VAE**
```bash
CUDA_VISIBLE_DEVICES=0 python3 ./pads_tal/train.py \
    --dataset-config ./config/sample_dataset.json \
    --model-config ./config/tal_pads/tal_vae.json \
    --pretrained-ckpt-path <your_pretrained_vae_ckpt_path> \
    --batch-size 4 \
    --num-gpus 8 \
    --num-workers 4 \
    --wandb-skip \
    --name mvae_train_1
```

**Train TAL-DM**
```bash
CUDA_VISIBLE_DEVICES=0 python3 ./pads_tal/train.py \
    --dataset-config ./config/sample_dataset.json \
    --model-config ./config/tal_pads/tal_dm.json \
    --pretransform-ckpt-path <your_pretrained_vae_ckpt_path> \
    --batch-size 4 \
    --num-gpus 8 \
    --num-workers 4 \
    --wandb-skip \
    --name mvae_dm_train_1
```

3. **How to generate**

**TAL-DM + CADS**
```bash
CUDA_VISIBLE_DEVICES=0 python3 ./pads_tal/inference.py \
    --ckpt-path <your_dm_ckpt_path> \
    --model-config ./config/tal_pads/tal_dm_cads.json \
    --dataset songdescriber \
    --text-type tag \
    --save-name="test_single" \
    --save-wav 
```

**Stable Audio Open + PADS**
```bash
CUDA_VISIBLE_DEVICES=0 python3 ./pads_tal/inference.py \
    --ckpt-path <stable_audio_open_model_path> \
    --model-config ./config/original/stable_audio_open_pads.json \
    --dataset songdescriber \
    --text-type tag \
    --save-name="test_single" \
    --save-wav 
```

4. **How to evaluate**
```bash
CUDA_VISIBLE_DEVICES=0 python3 ./pads_tal/eval.py \
    --result-path evaluated \
    --root-path output_inf/T1/test_single \
    --modes ipr \
    --ext .wav \
    --clap-basemodel music \
    --dataset songdescriber \
    --text-type tag \
    --ipr-basedata songdescriber
```
