# De-MINDS

## Data

### Training Data
We utilize Conceptual Captions to train a model. 
See [open_clip] to see the process of getting the dataset. 

The training data directory has to be in the root of this repo, and should be structured like below.
```bash
  cc_data
    ├── train ## training image diretories.
    └── val ## validation image directories.
  cc
    ├── Train_GCC-training_output.csv ## training data list
    └── Validation_GCC-1.1.0-Validation_output.csv ## validation data list
```

### Test Data
See [README](data/README.md) to prepare test dataset.

## Training

### Install dependencies
See [open_clip] for the details of installation. 
The same environment should be usable in this repo.
setenv.sh is the script we used to set-up the environment in virtualenv. 

Also run below to add directory to pythonpath:
```bash
. env3/bin/activate
export PYTHONPATH="$PYTHONPATH:$PWD/src"
export PYTHONWARNINGS='ignore:semaphore_tracker:UserWarning'
```

### Prepare training data
```bash
python -u build_input_csv.py \
    --data_path path/to/cc/Train_GCC-training_output.csv \
    --save_path="cc/Train_GCC-training_output.csv"
```

### Prepare LLaVA

1. Clone this repository and navigate to LLaVA folder
```bash
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
```

2. Install Package
```Shell
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
```

3. Install additional packages for training cases
```
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
```

### Upgrade to latest code base

```Shell
git pull
pip install -e .

# if you see some import errors when you upgrade, please try running the command below (without #)
# pip install flash-attn --no-build-isolation --no-cache-dir
```

### Use LLaVA to generate pseudo-manipulation descriptions
```bash
python -m llava.eval.cc_rewrite_multi     --model-path ./LLaVA/checkpoints/llava-v1.6-vicuna-7b
```

### Sample running code for training:

```bash
CUDA_VISIBLE_DEVICES=0,1,2,3 python -u src/main.py \
    --seed 42 \
    --save-frequency 1 \
    --train-data="cc_rewrite/Train_GCC-training_output.csv"  \
    --warmup 10000 \
    --batch-size=256 \
    --lr=1e-6 \
    --wd=0.1 \
    --epochs=30 \
    --workers=8 \
    --openai-pretrained \
    --model ViT-L/14 \
    --dataset-type rewrite \
    --temperature 0.5 \
    --n_query 4
```

### Sample evaluation only:

Evaluation on COCO, ImageNet, or CIRR.
```bash
python src/eval_retrieval.py \
    --openai-pretrained \
    --resume /path/to/checkpoints \
    --eval-mode $data_name \ ## replace with coco, imgnet, or cirr
    --gpu $gpu_id
```

Evaluation on fashion-iq (shirt or dress or toptee)
```bash
python src/eval_retrieval.py \
    --openai-pretrained \
    --resume /path/to/checkpoints \
    --eval-mode fashion \
    --source $cloth_type \ ## replace with shirt or dress or toptee
    --gpu $gpu_id
```
