# Multi-TAP

> **Note on Anonymity**: For the anonymous review period, some contents that could reveal author identity have been removed. The complete repository with full contents will be made available after the review period.

This repository contains code for training and evaluating image-text models on various datasets, supporting both single-objective and multi-objective training approaches.

## Requirements

- Python 3.11
- PyTorch
- CUDA-capable GPU
- Hugging Face token (set as environment variable `HF_TOKEN`)
- Additional dependencies:
  ```bash
  pip install -r requirements.txt
  ```

## Installation

1. Clone the repository
2. Install the required packages:

## Usage

### Training

#### Single Objective Training

To train a model on a single objective:

```bash
./scripts/train_single.sh
```

Key arguments:

- `--model_name`: Base model to use (e.g., "Qwen/Qwen2-VL-2B-Instruct")
- `--dataset_name`: Dataset to train on (We use polaris and imagereward)
- `--output_dir`: Directory to save checkpoints
- `--learning_rate`: Learning rate for training
- `--num_train_epochs`: Number of training epochs
- `--per_device_train_batch_size`: Batch size per device

#### Multi-Objective Training

To train a model on multiple objectives:

```bash
./scripts/multi_train_inference.sh
```

Supported datasets:

- vision_reward
- text-image-to-text
- text-to-image

Key arguments:

- `--model_name`: Base model to use
- `--checkpoint_dir`: Directory containing model checkpoints
- `--dataset`: Dataset type to use
- `--save_dir`: Directory to save outputs

### Evaluation

To evaluate a trained model:

```bash
./scripts/evaluate.sh
```

#### Command Line Arguments

The evaluation script accepts the following arguments:

- `--dataset_name`: Name of the dataset to evaluate on (default: "filtered-polaris")
- `--device`: Device to run evaluation on (default: "cuda")
- `--model_name`: Name of the model to use (default: "Qwen/Qwen2-VL-2B-Instruct")
- `--ckpt_path`: Path to model checkpoint (default: "")
- `--token`: Hugging Face token (default: uses HF_TOKEN environment variable)

### Example Commands

1. Single objective training:

```bash
python single_objective_train.py \
    --model_name "Qwen/Qwen2-VL-2B-Instruct" \
    --dataset_name "your_dataset" \
    --output_dir "checkpoints" \
    --learning_rate 1e-5 \
    --num_train_epochs 3
```

2. Multi-objective training:

```bash
python multi_objective_train.py \
    --model_name "Qwen/Qwen2-VL-2B-Instruct" \
    --checkpoint_dir "checkpoints" \
    --dataset "vision_reward" \
    --save_dir "outputs"
```

3. Evaluation:

```bash
python evaluate.py \
    --dataset_name "pascal" \
    --device "cuda" \
    --model_name "Qwen/Qwen2-VL-2B-Instruct" \
    --ckpt_path "path/to/checkpoint"
```

## Evaluation Metrics

The evaluation provides different metrics depending on the dataset type:

- For pairwise datasets (filtered-polaris, filtered-oid, pascal, foil, eye4b-pref, imgrew):

  - Accuracy score

- For non-pairwise datasets (eye4b-o, eye4b-a, flickrexp, flickrcf, polaris):
  - Kendall tau_b correlation
  - Kendall tau_c correlation
