# flowgen - Reproduction Code

## Initial setup

### Dependencies
```bash
# Install `uv`
curl -LsSf https://astral.sh/uv/install.sh | sh

# Source your shell
source ~/.bashrc

# Create a virtual environment (in the project root folder)
uv venv --python 3.12.7

# Activate your virtual environment
source .venv/bin/activate

# Install requirements to run flowgen:
uv sync --extra dev
```

## Runtime configuration
See `minimal/configuration.py` for detailed configuration information.

Some quick pointers:
- Use `config.yaml` for your preferred local configs (see `config.yaml.sample`).
- Set sensitive credentials in the `runtime-secrets/` directory.
  For example:
  ```bash
  $ cat runtime-secrets/azure_oai__api_key
  asdfasdfasdf12341234
  ```
- Env vars can also be used - They are prefixed with `FLOWGEN_` and config sections separated by `__`, like `FLOWGEN_PATHS__ROOT_DIR=foo/bar`


#### LLM Tracing
To enable tracing during local development, such as running functional tests, first add the following to your ``config.yaml``:
```yaml
arize:
  tracing_enabled: true
```

Or use the ``-i/--instrumentation`` flag when running ``eval_runner.py``.

You will also need to start the `phoenix` server - run `phoenix serve` in another terminal.

## Evaluating Flows

The script `minimal/eval_runner.py` is the main workhorse:

```bash
python minimal/eval_runner.py --help
Usage: eval_runner.py [OPTIONS]

  Evaluates flows defined in flows_file against the specified dataset, stores
  results in a CSV, appends results iteratively, and skips flows already
  present in the output CSV.

Options:
  -f, --flows-file FILE    Path to the file containing flow definitions (one
                           JSON per line).  [required]
  -d, --dataset-name TEXT  Name of the dataset class in minimal.data.
                           [default: HotPotQAHF]
  -i, --instrumentation    Enable Arize instrumentation.
  -o, --output-file FILE   Path to the output CSV file. Defaults to ./flowgen-
                           eval-results-{dataset_name}.csv
  -m, --max-evals INTEGER  MaximumQApairstoevaluateperflow  [default: 1000]
  --help                   Show this message and exit.
```

For example:

```bash
python minimal/eval_runner.py \
    --instrumentation \
    --dataset FinanceBenchHF \
    --max-evals 10 \
    --flows-file flow-files/small-models_financebench_pareto-flows.jsonl
```

The script will periodically output evaluation data in the `--output-file` in `csv` format.


## Run Data

Historical optimization run data is provided in `flowgen-main-results.csv`.

The columns `values_0` and `values_1` correspond to accuracy and mean cost ($ per call), respectively.
