This code is based on the public repository of DeepLTL (https://github.com/mathiasj33/deep-ltl) with the following modifications:
1. We remove curriculum training since we only randomly sample subgoals.
2. We remove the DeepSet and RNN parts since our policy is only condiitoned on the current subgoal.
3. We adapt a safe RL algoeithm (https://arxiv.org/abs/2205.07536) by integrating formal specifications to address the safety constraints.
4. We implement the subgoal switching mechanism to deal with unsatisfiable subgoals.
5. We construct more variants of environment for comprehensive evaluation.

## Installation
The code requires Python 3.10 with a working installation of PyTorch (tested with version 2.2.2). In order to use the _ZoneEnv_ environment, use the following command to install the required dependencies:
```bash
conda activate genzltl
cd src/envs/zones/safety-gymnasium
pip install -e .
```
To install the remaining dependencies, run
```bash
pip install -r requirements.txt
```
We use _Rabinizer 4_ (https://www7.in.tum.de/~kretinsk/rabinizer4.html) to convert LTL formulae into automata. This requires Java 11 to be installed.

## Training

To train a model on an environment, run the `train_rco.py` file in `src/train`. We provide scripts to train a model with the default parameters in our evaluation environments (_LetterWorld_: LetterSafetyEnv-v0 and _ZoneEnv_: PointLltSafety2-v0). For example, to train a model on the _ZoneEnv_ environment, run
```bash
PYTHONPATH=src/ python run_zones_safety.py --device gpu --name GenZ-LTL --seed 1
```
The resulting logs and model files will be saved in the `experiments` folder.

## Evaluation

We provide evaluation scripts in `src/evaluation`. To evaluate a trained model with a given LTL formula, run
```bash
PYTHONPATH=src/ python src/evaluation/simulate.py --env <env_name> --exp GenZ-LTL --seed 1 --formula <LTL_spec>
```

For a more comprehensive evaluation, we provide the scripts `eval_test_tasks_finite.py` and `eval_test_tasks_infinite.py` to evaluate the performance of a model on a set of test tasks. The former evaluates the model on a set of finite-horizon tasks, while the latter evaluates the model on a set of infinite-horizon tasks. 
