# Evaluation Patches

This directory contains patches for evaluation frameworks used in FT-Agent.

## OpenCompass Patch

The `opencompass.patch` file contains custom modifications to the [OpenCompass](https://github.com/open-compass/opencompass) evaluation framework.

### Added Benchmarks

The patch adds support for the following benchmarks:

| Benchmark | Description |
|-----------|-------------|
| **BioProBench** | Biology problem benchmark (gen, ord, err, pqa variants) |
| **ChemCoTBench** | Chemistry chain-of-thought benchmark (mol_edit, mol_opt, mol_und, reaction) |
| **TableBench** | Table understanding benchmark (data analysis, fact checking, numerical reasoning, visualization) |
| **PANORAMA** | Multi-task benchmark (noc4pc, par4pc, pi4pc variants) |
| **NVBench v2** | SQL generation benchmark |
| **FinanceIQ** | Finance domain benchmark with LLM judge support |

### How to Apply

```bash
# Clone OpenCompass
git clone https://github.com/open-compass/opencompass.git
cd opencompass

# Apply the patch
git apply /path/to/opencompass.patch

# Install
pip install -e .
```

### Patch Statistics

- Files changed: 52
- Lines added: ~7,649
- Lines removed: ~11

### Base Version

This patch is based on OpenCompass main branch (fetched on the submission date).
