# Setup 

## Dependency
```
conda create -n unidiffuser python=3.9
conda activate unidiffuser
pip install -r requirements.txt
```

## Pretrained Models
- Follows [Unidiffuser Repo](https://github.com/thu-ml/unidiffuser#pretrained-models)


# Train and Infer

## Train 
- Start training script with accelerator
    - set visible CUDA devices
    - set the configs file for training
```
CUDA_VISIBLE_DEVICES=0 accelerate launch --mixed_precision fp16 --multi_gpu train.py --config=configs/t2iadp1_no_encode.py
```
- training parameters can be adjust in the config file

## Training data
- a tiny version of training data is provided in `train_data`, refer to `./train_data/FFHQ_tiny.jsonl`
- it is a jsonl file with the following attributes for each text-image pair
    - path: image path 
    - caption: caption for the image, caption output from [LLaVA](https://github.com/haotian-liu/LLaVA)
    - width: image width
    - height: image height
    - mask_path: mask image from [Facer](https://github.com/FacePerceiver/facer)
    - bbox: bounding box for the head
    - square_bbox: square bounding box for the head
    - bbox_face: bounding box for the face
    - square_bbox_face: square bounding box for the face


## Infer
- refer to `infer.ipynb`
- make sure the `feed_resume_path` is link to the pretrained weight
- pretrained weight can be download from [GDrive](https://drive.google.com/drive/folders/1a__yTuMbtJPcr2O9aqq9QWqfoeyTODNk?usp=share_link)