﻿# Joint Learning Between Reference Image and Text Prompt for Fashion Image Editing

##  Installation
1. Install the conda environment:
```
conda env create -f environment.yml
```

## Training and Inference
### Preparing Data
Directly use the example images in `./examples`, or you can prepare your own pair:
Create a folder named `{FASHION_IMAGE}_{ID}+{TARGET_GARMENT}_{ID}`, where `{FASHION_IMAGE}` and `{TARGET_GARMENT}` are the category names for the original fashion image and target garment in the reference image, respectively, and `{ID}` is the customized index (you can set it to whatever you want) that helps you distinguish.

the data are organized like:
person_u+skirt_a/
├── 0_person_u0.jpg
├── 1_skirt_a0.jpg
```

### Training
1、Train D$^2$-Edit with default hyperparameters：

CUDA_VISIBLE_DEVICES=4 
python train.py \
--instance_data_dir examples/person_u+skirt_a \
--enable_xformers_memory_efficient_attention \
--use_8bit_adam \
--set_grads_to_none


2、Train D$^2$-Edit with customized hyperparameters, such as:

python train.py \
--instance_data_dir examples/person_k+skirt_a \
-phase_train_steps 1000 \
--phase_learning_rate 1e-4 \
  --lora_rank 512 \
--enable_xformers_memory_efficient_attention \
--use_8bit_adam \
--set_grads_to_none

###Inference
After training, a model will be saved in `outputs/D$^2$-Edit`. Placeholder tokens `<v0>` and `<v1>` will be assigned to the original fashion image and target garment in the reference image, respectively for text-driven fashion image editing.

python inference.py   \
--model_path "outputs/D$^2$-Edit" \
--prompt "<v0> with long sleeves"  \
  --output_path "outputs/inference/result.jpg"