# Benchmark for Neural Embeddings

**Licence**: Apache-2.0

This is a **benchmarking framework** designed to evaluate how effectively compressed embeddings preserve information for downstream tasks.

In domains like Earth Observation (EO), pipelines typically handle large volumes of image data used primarily for analytical tasks. Traditional compression techniques focus on pixel-level reconstruction, while Foundation Model (FM) research does not explicitly consider embedding size. This benchmark addresses this gap by enforcing strict size constraints and evaluating embeddings directly downstream tasks.


## Key Features

- **Model-agnostic**: Supports evaluation of any fixed-size embedding (e.g. 1024‑dim feature vectors), which enables comparison among compression and representation learning methods.
- **Task-Driven Evaluation**: Utilizes linear probes across diverse EO tasks, including land-cover proportion estimation, cloud detection, and biomass estimation. 
- **Metrics**: Incorporates signal-to-noise scores and dynamic rank aggregation to compare methods.

---

## Quickstart

```bash
# start from fresh environment (skip if not needed)
micromamba create -n env -c conda-forge python=3.12
micromamba activate env

# Install requirements
cd ICLR-SUBMISSION-CODE
pip install -r requirements.txt

# run standalone evaluation script
python main.py \
  --annotation_path path/to/annotation_folder \
  --submission_file path/to/submission_file.csv \
  --output_dir path/to/results \
  --config path/to/config.yaml \
  --method_name your-method-name \
  --phase phase-name
```

- `--annotation_path` Directory containing CSV label files for each task.  
- `--submission_file` CSV file with your embeddings.  
- `--output_dir` Destination for per-task reports, plots, and aggregated benchmark results.  
- `--config` YAML file specifying cross-validation settings and logging options (see provided sample).  
- `--method_name` Identifier for your method used in filenames and leaderboard entries.  
- `--phase` Groups evaluation runs under a specified phase name for ranking, creating a subfolder within `output_dir`. 

To disable GPU utilization, run `CUDA_VISIBLE_DEVICES=''` before execution.



---

## Evaluation and Ranking

Run the benchmark on your embeddings with:

```bash
python main.py \
  --annotation_path path/to/annotation_folder \
  --submission_file path/to/submission_file.csv \
  --output_dir path/to/results \
  --config path/to/config.yaml \
  --method_name "your-method-name" \
  --phase "phase-name"
```

### Configuration

A sample config file (`benchmark/config.yaml`) specifies:

- `batch_size`, `epochs`, `learning_rate`, `k_folds`: Cross-validation settings. 
- `standardize_embeddings`: Standardize embeddings using global mean and std (recommended).  
- `normalize_labels`: Normalize target labels to [0,1] (recommended).
- `enable_plots`: Generate per-fold plots (e.g., parity plots for regression).  
- `update_leaderboard`: Aggregate and update leaderboard after evaluation.  
- `task_filter`: Tasks to evaluate (default: all tasks available in `annotation_path`). 

### Results

Results saved under `output_dir/<phase-name>/` include:

- Task-specific metrics and loss curves
- `results_summary.json` with per-task signal-to-noise scores and overall scores

### Aggregation

Aggregate scores for leaderboard by setting `update_leaderboard` to `True` during last evaluation or manually run:

```bash
from evaluation.results import summarize_runs
summarize_runs(output_dir=output_dir, phase=phase)
```

