# PEL-NAS · Parallel Ensemble LLM-driven Neural Architecture Search

PEL-NAS is a refreshed take on our LLM-assisted NAS workflow. It relies on a
lightweight CSV repository for the full NAS-Bench-201 search space, integrates a
zero-cost XGBoost predictor, and keeps the original six-category convolution
partitioning to foster diverse exploration.

## Highlights

- **CSV-backed metrics repository** – instant access to accuracy and latency for
  all 15,625 NAS-Bench-201 architectures without loading `.pth` files.
- **Zero-cost predictor integration** – optional XGBoost booster (via
  `zc_predictor/`) that operates on stored zero-cost features.
- **Six convolution categories & parallel LLM prompts** – the original
  multi-category searcher remains, now fully backed by the new data pipeline.
- **Modernised visualisation & analysis** – dataset baselines and Pareto metrics
  are recomputed on demand using the CSV data (no cache files required).

## Repository Layout

```
├── run.py                       # Main entry point for single search / ablation
├── run_all.py                   # Convenience driver for multi-device batches
├── compute_pareto_metrics.py    # Standalone Pareto metrics helper (CSV-based)
├── pel_nas/
│   ├── core/
│   │   ├── config.py            # Global configuration (datasets, predictor)
│   │   └── main_controller.py   # Ablation orchestration
│   ├── data/
│   │   ├── arch_converter.py    # NB201 arch string ↔ tuple conversions
│   │   ├── metrics_repository.py# CSV-backed metrics loader
│   │   ├── nas_bench_api.py     # Compatibility shim built on the CSV
│   │   └── pareto_metrics.py    # Pareto helpers (HV, IGD, dataset fronts)
│   ├── llm/
│   │   ├── llm_client.py        # Parallel multi-category LLM client
│   │   └── prompts/             # Category-specific prompt templates
│   ├── search/
│   │   ├── architecture_analyzer.py
│   │   ├── evaluator.py         # CSV + predictor-backed evaluator
│   │   └── nas_searcher.py      # Multi-category search engine
│   └── visualization/
│       ├── visualizer.py        # Single-run visualisations
│       └── unified_visualizer.py# Ablation comparison figure
├── experiments/
│   ├── ablation/ablation_studies.py
│   ├── conv_two3x3_llm_evolve.py
│   ├── llm_clustering_demo/plot_edgegpu_triptych.py
│   └── ...
├── scripts/
│   ├── run_all_combinations.py
│   └── validate_predictor_accuracy.py
├── tools/                       # Legacy HW-NAS-Bench utilities (optional)
├── zc_predictor/                # Zero-cost feature store & XGBoost wrapper
├── nb201_hw_metrics.csv         # Combined accuracy + latency CSV (required)
└── requirements.txt
```

## Data & Predictor Files

| File | Description |
| --- | --- |
| `nb201_hw_metrics.csv` | Combined NAS-Bench-201 accuracy and HW-NAS-Bench latency for all architectures (required). |
| `zc_predictor/data/nb201_zc_scores.json` | Zero-cost feature store used by the predictor helper (required for predictor mode). |
| `zc_predictor/models/<dataset>.json` | Optional XGBoost booster files; provide one per dataset to enable the predictor. |

The zero-cost model paths can be configured in
`pel_nas/core/config.py::PREDICTOR_CONFIG`. When a model is missing PEL-NAS will
fall back to ground-truth accuracies from the CSV.

## Installation

```bash
git clone https://github.com/your-account/PEL-NAS.git
cd PEL-NAS

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Optional: enable predictor mode
# pip install xgboost
```

Export your LLM API key if you intend to use the online generator:

```bash
export OPENAI_API_KEY="sk-..."
```

## Quick Usage

### 1. Single-device search

```bash
python run.py --dataset cifar100 --hardware-device edgegpu --iterations 5

# Enable the zero-cost predictor (requires XGBoost booster)
python run.py --dataset cifar10 --hardware-device raspi4 --iterations 10 --use-predictor
```

Outputs (results JSON, summary markdown, visualisations) are written to
`outputs/single_device_<dataset>_<device>_<timestamp>/`.

### 2. Ablation controller

```bash
python run.py --mode ablation --dataset cifar100 --iterations 3
```

This spawns the orchestrated experiments defined in
`experiments/ablation/ablation_studies.py` and finishes with a unified comparison
figure plus per-method metrics.

### 3. Pareto metrics on existing results

```bash
python compute_pareto_metrics.py \
  --all_csv nb201_hw_metrics.csv \
  --devices edgegpu,raspi4 \
  --out_dir outputs/pareto_eval
```

### 4. EdgeGPU triptych (LLM driven)

```bash
# Requires OPENAI_API_KEY. Generates three prompting strategies and plots
# a comparative triptych.
python experiments/llm_clustering_demo/plot_edgegpu_triptych.py \
  --dataset cifar100 --hardware edgegpu --iterations 5 --per-iteration 10
```

Outputs appear in `experiments/llm_clustering_demo/outputs/` (scenario JSON and
`edgegpu_triptych.png`).

Provide a JSON file via `--found_indices` to evaluate your own discovered sets.

## Search Configuration Snapshot

```python
from pel_nas.core.config import SEARCH_CONFIG

for name, spec in SEARCH_CONFIG['conv_categories'].items():
    print(name, spec['description'], spec['target_count'])
```

Each category pins the expected number of `nor_conv_3x3`/`nor_conv_1x1` blocks
and maps to a dedicated prompt under `pel_nas/llm/prompts/`.

## Notes & Compatibility

- The legacy NAS-Bench-201 API is no longer required. Tools inside `tools/`
  still expect the original `.pth` and HW-NAS-Bench pickle; treat them as
  optional extras.
- Predictor integration is fully optional. If the XGBoost booster cannot be
  loaded the evaluator reverts to CSV accuracies.
- Dataset baselines, hypervolume and IGD now rely on
  `pel_nas/data/pareto_metrics.py`, which mirrors the logic shipped in
  `compute_pareto_metrics.py`.

Feel free to open issues or pull requests if you extend the data sources or the
predictor suite. Enjoy exploring NAS with parallel LLM guidance!
