# REPVLM

This repository contains the implementation for **REPVLM** (Riemannian Flow Matching for Vision-Language Models) and related baselines.

## Structure

- `models/`: Model definitions (REPVLM, ProbVLM, etc.).
- `custom_datasets/`: Data loading logic for proxy (training) and eval datasets.
- `embed/`: Scripts for extracting and caching embeddings from the backbone VLM.
- `train/`: Training scripts for REPVLM.
- `eval/`: Evaluation scripts for uncertainty estimation (Riemannian Manifold, MC Dropout).
- `scripts/`: Shell scripts to run the full pipeline (caching, training, evaluation).
- `plot.py`: Script for visualizing results.

## Installation

This project uses `uv` for dependency management.

```bash
uv sync
```

Or install dependencies manually as listed in `pyproject.toml`.

## Data Preparation

The code expects datasets to be in `webdataset` format (`.tar` files) located in the `data/` directory.

Expected structure:
```
data/
├── cc/              # CC3M
├── datacomp/        # DataComp
├── laion/           # LAION-400M
├── cifar100/
├── food101/
├── imagenet1k/
├── ...
```

Ensure your `custom_datasets/` files point to the correct paths if different.

## Usage

### 1. Cache Embeddings

Before training or evaluation, you need to extract embeddings from the frozen backbone VLM (configured in `configs.yaml`).

```bash
bash scripts/cache.sh
```
This runs `embed.proxy` for training datasets and `embed.eval` for evaluation datasets, saving tensors to `embeddings/<dataset>/`.

### 2. Training

Train the REPVLM model on the cached embeddings.

```bash
bash scripts/train_repvlm.sh
```
Adjust the `dataset` and `seed` variables in the script as needed.

### 3. Evaluation

Evaluate the models using uncertainty estimation.

```bash
bash scripts/eval.sh
```
This script runs:
- `eval.unc_rmv`: For REPVLM and ProbVLM.
- `eval.mcdo`: For Monte Carlo Dropout baseline.

Results are saved to `results/cls/<eval_ds>/<proxy_ds>/<method>/`.

### 4. Plotting

Generate plots from the evaluation results.

```bash
uv run plot.py
```
This will produce a `results.pdf` figure.
