# LABO: LLM‑Accelerated Bayesian Optimization

This repository contains the code for **LABO (LLM‑Accelerated Bayesian Optimization)**. The pipeline uses an LLM as a low‑fidelity surrogate and combines it with standard BO components to decide what to evaluate next with a high‑fidelity evaluator.

## Repository layout

- **`run_*.py`**: entry points per benchmark task (FeCr / COF / Sandwich / Fullerene / P3HT / PCE10).
- **`high_fidelity/`**: high‑fidelity task evaluators backed by the corresponding benchmark datasets.
- **`low_fidelity/`**: LLM prompting + parsing + low‑fidelity prediction utilities.
- **`koh/`**: BO loop implementation (acquisition, models, decision, optimizer).
- **`API/`**: LLM client implementations (remote API + optional local inference).

## Installation

Create a Python environment and install dependencies:

```bash
pip install -r requirements.txt
```

## Data availability (HF raw data files removed)

The original high‑fidelity benchmark data files under `high_fidelity/` have been removed from this repository. To run the code, you need to obtain the datasets from the corresponding **paper references** and place them back with the exact filenames below (same directory as the evaluator modules):

- `high_fidelity/fecr.csv`
- `high_fidelity/cof.csv`
- `high_fidelity/p3ht.csv`
- `high_fidelity/PCE10.csv`
- `high_fidelity/Fullerene.csv`
- `high_fidelity/food.csv` (Sandwich task)

If these files are missing, the HF evaluators cannot load bounds/data and the runs will fail.

## How to run

All entry scripts are in the project root. They accept:

- `--seed <int>`: run a single seed
- `--seeds "1,2,3"`: run multiple seeds sequentially

Examples:

```bash
python run_fecr.py --seed 1
python run_cof.py --seeds "1,2,3"
python run_sandwich.py --seed 1
python run_fullerene.py --seed 1
python run_p3ht.py --seed 1
python run_pce10.py --seed 1
```

Outputs are written under:

- `outputs/<TASK_NAME>/KOH/`

Each run uses a timestamp-based prefix by default (see `RUN_TAG` / `file_prefix` in each `run_*.py`).

## Notes for reviewers

- **PCE10 is a minimization task**: the runner converts it into an equivalent maximization form internally (see `run_pce10.py`).
- **Optional local inference**: `API/llm_clients.py` supports an optional local backend; dependency availability may vary by platform.
