# AsymVLM: Asymmetric Vision-Language Matching

This repository contains the implementation of Asymmetric Probabilistic Vision-Language Model (AsymVLM).

## Overview

AsymVLM is a post-hod adaptation method for pre-trained vision-language to model the uncertainty of text embeddings. The implementation include two distribution types for modeling text embeddings:
- VMF (von Mises-Fisher)
- PSD (Power Spherical Distribution)

## Installation

1. Install the required dependencies:
```sh
pip install -r requirements.txt
```

## Usage

### 1. Cache Embeddings

First, cache the CLIP embeddings for the dataset:

```sh
python cache_embeddings.py --dataset coco
```

### 2. Train the Adaptor

Train the AsymVLM adaptor with either VMF or PSD distribution:

```sh
python train.py --dataset coco --method asymvlm-psd --seed 0
```

Options for `--method`:
- `asymvlm-psd`: Power Spherical Distribution
- `asymvlm-vmf`: von Mises-Fisher Distribution

### 3. Evaluate the Model

Evaluate the trained model on cross-modal retrieval tasks:

```sh
python eval.py --dataset coco --method asymvlm-psd --seed 0 --uncer_levels 10
```

## Project Structure

```
.
├── datasets/
│   ├── coco.py
│   └── embedding.py
├── models/
│   └── asymvlm/
│       └── adaptor.py      # Main model implementation
├── utils/
│   ├── preprocess.py
│   └── seed.py
├── cache_embeddings.py     # Script to cache CLIP embeddings
├── train.py               # Training script
├── eval.py               # Evaluation script
└── requirements.txt      # Project dependencies
```

<!-- ## Citation

If you find this code useful for your research, please cite the original paper:

```bibtex
[Add citation information here]
``` -->

## License
[CC-BY-4.0](LICENSE.md)
