

# Installing Dependencies

You'll need [uv](https://docs.astral.sh/uv/getting-started/installation/)
installed on your system. Then you can run:

```bash
uv sync
```

# API Keys

We support several frameworks (i.e., APIs to communicate with the LLMs). For
each API, you need to setup its related environment variable with
a valid API key:
- for the Google Gemini API you must define `GOOGLE_API_KEY`
- for the DeepSeek API you must define `DEEPSEEK_API_KEY`
- for the OpenAI API you must define `OPENAI_API_KEY`

# Heuristic Generation

The basic command is:

```bash
uv run llm-heuristics.py --domain DOMAIN
```
where `DOMAIN` is the name of the domain for which the heuristic is generated. The domains used in the paper are blocksworld, childsnack, floortile, miconic, rovers, sokoban, spanner, and transport.

There are several other options you can pass to the command line:

- `--domain`: choose the domain that you want to use.
- `--framework`: choose what framework you want to use. Available choices are: `gemini` (default), `deepseek`, and `openai`.
- `--model`: choose the LLM model to be used. The available models depend on the selected framework.
- `--prompt-format`: choose prompt format. *Note: this option was used during development and is no longer necessary; it will be removed in the future.* (Default: `neurips`)
- `--heuristic-name`: name of the heuristic and of its class.
- `--heuristic-file`: file where the learned heuristic is stored. It must end with `.py`.
- `--temperature`: temperature (default: 1.0)
- `--top-p`: top-k value (default: 0.5)
- `--ablation`: choose a component to do the ablation. Omit this option if you want the complete prompt.

## Pyperplan

To run Pyperplan with your generated heuristic, execute

```bash
uv run src/pyperplan/pyperplan.py -s gbfs_early_goal_test -H HEURISTIC-FILE /path/to/domain.pddl /path/to/instance.pddl
```

where `HEURISTIC-FILE` is the (relative or absolute) path to the heuristic
generated in the previous step.

Alternatively, you can use `hff` to run the FF heuristic or `blind` to run the blind heuristic (effectively, no heuristic).

## Example

Let's say you want to learn a heuristic for the `blocksworld` domain with the name `AmazingHeuristic`. Then you should run

```bash
uv run llm-heuristics.py --domain blocksworld --heuristic-name AmazingHeuristic --heuristic-file amazing-heuristic.py
```

Now we can call Pyperplan using the new heuristic with the following command to solve instance
an instance of the `blocksworld` testing set:

```bash
uv run src/pyperplan/pyperplan.py -H amazing-heuristic.py -s gbfs_early_goal_test benchmarks/ipc2023-learning/testing/blocksworld/easy-p03.pddl
```

# End-to-End Plan Generation

The basic command is:

```bash
uv run end-to-end.py --domain DOMAIN --instance INSTANCE
```
where `DOMAIN` is the name of the domain of the instance, and `INSTANCE` is the particular instance for which we are computing a plan.

Other available options are:
- `--framework`: choose LLM framework. Available choices are: `gemini` (default), `deepseek`, and `openai`.
- `--model`: choose the LLM model. The available models depend on the selected framework.
- `--plan-file`: file where the computed heuristic is stored.
- `--temperature`: temperature (default: 0.1)
- `--top-p`: top-k value (default: 0.5)

# Experimental Data

We also include the relevant experimental data for the main contributions of the paper. However, due to the size limits of the supplementary material, we could not include all raw data in our ZIP file. In case of publication, we plan to make all raw data and reports publicly available in an open source tool (e.g., Zenodo).

These are the directories containing experimental data used in the paper:

- `archived-heuristics`: Contains all heuristics generated by the models used, including the ones for the ablation study.
- `benchmarks`: Contains all the domains from the Learning Track of the IPC 2023 benchmark used in the paper. The benchmark set is split into training and testing sets.
- `reports`: Contains the combined data, in an HTML report, of all data produced by Pyperplan during the testing phase of our methods for heuristic generation. Due to size limits, we cannot include the raw data or the ablation reports.
- `raw-logs`: Contains the raw logs for heuristic generation and end-to-end plan generation for DeepSeek models. Due to size limits, we cannot include the logs for other models in here.
