# Query-Guided Prototype Generation for Few-Class Classification 

## 1  Directory Layout
```
QGPG-FCC/
│   data.py               # dataset & loader helpers
│   engine.py             # training / evaluation loops
│   experiments.py        # sweeps & ablation helpers
│   models/               # backbone + transformer head
│   utils/                # misc. scripts (DB builder, checks, plotting)
│   datasets/             # <-- place your datasets here
│   results/              # <-- auto-generated experiment outputs
└── README.md
```

---

## 2  Installation

Requires **Python ≥ 3.9** and a recent CUDA-enabled PyTorch build.

```bash
# clone repository
$ git clone https://github.com/../QGPG-FCC
$ cd QGPG-FCC

# create isolated env (example: conda)
$ conda create -n QGPG python=3.10 -y
$ conda activate QGPG

# core dependencies
$ pip install -r requirements.txt

# (optional) bleeding-edge timm / open-clip
$ pip install git+https://github.com/huggingface/pytorch-image-models.git
$ pip install git+https://github.com/mlfoundations/open_clip.git
```

---

## 3  Dataset Preparation

All experiments expect datasets under `./datasets/<dataset_name>/` following this *common* layout (generated automatically by `utils/database.py`):

```
datasets/<dataset_name>/
    ncl_<N>/                # «N-class» subset folders (N ∈ {2,…,10})
        subset_<i>/         # i = 0,…,S-1 (S subsets per N)
            meta/
                train.txt   # "img_path label" per line
                val.txt
            train/          # class-agnostic flat image folder
            val/
            db/
                DB.csv                  # master index
                db_support_set.json     # pre-computed K-shot supports
```

1. **Download** the full dataset with the `train/` & `val/` class folders. To do so follow "Generate configs for datasets and models" section of the https://github.com/bryanbocao/fca repository published by the **Few-Class Arena (FCA)** benchmark (ICLR 2025, arXiv 2411.01099). 

2. **Generate** N-class subsets and support sets:

```bash
$ python utils/database.py \
      --n_subsets 5 \
      --k 5           # K = support shots per class
```
Edit `utils/database.py` → `dataset_list` & `n_class_values` to choose datasets / N values. Our `utils/database.py` script:

1. Copies the FCA-listed images into the flat `train/` and `val/` folders above,  
2. Generates `DB.csv`,  
3. Computes CLIP embeddings to build a per-query **K-shot support set** (`db_support_set.json`), and  
4. Stores cosine-similarity maps (`train_similarities.json`, `test_similarities.json`).

> **Tip** Prefer on-the-fly random supports? Run `main.py` with `--support_selection random` and skip step 2.

---

## 4  Running Experiments

Single-GPU example (5-shot, 5-class *CUB-200*):

> Run all ncl_* folders under caltech101:

```bash
$ python main.py \
  --base_dir ./datasets/caltech101 \
  --support_size 5 \
  --support_selection nearest \
  --prototype_construction query_fusion \
  --batch_size 8 \
  --epochs 5 \
  --lr 1e-4 \
  --model_choice resnet50 \
  --head_type transformer \
  --test_tag Debug
```

> Run only ncl_5 (it will still process all subsets under ncl_5):

```bash
$ python main.py \
  --base_dir ./datasets/caltech101 \
  --ncls 5 \
  --support_size 5 \
  --support_selection nearest \
  --prototype_construction query_fusion \
  --batch_size 8 \
  --epochs 5 \
  --lr 1e-4 \
  --model_choice resnet50 \
  --head_type transformer \
  --test_tag Debug
```

---

## 5  Outputs & Checkpoints
```
results/<test_tag>/<dataset>/<ncl_*>/subset_*/
    results.csv         # per-run metrics
```

---
