## Requirements

To install the python package, run:

```setup
pip install .
```

In addition, the LTL, automata manipulation and model checking library [Spot](https://spot.lrde.epita.fr) is required. Please follow the [download and installation instructions](https://spot.lrde.epita.fr/install.html) on their website.

## Download datasets
We assume you are in your desired base directory. Create a directory 'data', where all data sets will be stored.
For each data set you are interested in, create a corresponding subdirectory, e.g. 'data/LTLPattern126' and download the required parts for training (also requires validation) or testing from the following list:

### LTLPattern126

- [Training Set](https://storage.googleapis.com/deepltl_data/data/ltl_traces/dac_concat_edge-na-6-ts125-nf-1p6m/train.txt)
- [Validation Set](https://storage.googleapis.com/deepltl_data/data/ltl_traces/dac_concat_edge-na-6-ts125-nf-1p6m/val.txt)
- [Test Set](https://storage.googleapis.com/deepltl_data/data/ltl_traces/dac_concat_edge-na-6-ts125-nf-1p6m/test.txt)

### LTLRandom35
1 million LTL formulas of maximum size 35 with 5 different propositons split into a training set of 800k formulas, a validation set of 100k formulas and a test set of 100k formulas
- [Training Set](https://storage.googleapis.com/deepltl_data/data/ltl_traces/na-5-ts-35-nf-1m-lbt-sat/train.txt)
- [Validation Set](https://storage.googleapis.com/deepltl_data/data/ltl_traces/na-5-ts-35-nf-1m-lbt-sat/val.txt)
- [Test Set](https://storage.googleapis.com/deepltl_data/data/ltl_traces/na-5-ts-35-nf-1m-lbt-sat/test.txt)

### LTLRandom50
A test set of 20k LTL formulas of size 35 to 50 with 5 different propositons
- [Test Set](https://storage.googleapis.com/deepltl_data/data/ltl_traces/na-5-ts-35-50-nf-20k-lbt-sat/test.txt)

### LTLUnsolved254

- [Test Set](https://storage.googleapis.com/deepltl_data/data/ltl_traces/dac_timeouts-ts254/test.txt)

### PropRandom35
1 million propositional logic formulas of maximum size 35 with 5 differnt propositons split into a training set of 800k formulas, a validation set of 100k formulas and a test set of 100k formulas
- [Training Set](https://storage.googleapis.com/deepltl_data/data/sat/na-5-ts-35-nf-1m-lbt-sat/train.txt)
- [Validation Set](https://storage.googleapis.com/deepltl_data/data/sat/na-5-ts-35-nf-1m-lbt-sat/val.txt)
- [Test Set](https://storage.googleapis.com/deepltl_data/data/sat/na-5-ts-35-nf-1m-lbt-sat/test.txt)

### PropRandom50
A test set of 20k propositional logic formulas of size 35 to 50 with 5 different propositons
- [Test Set](https://storage.googleapis.com/deepltl_data/data/sat/na-5-ts-35-50-nf-20k-lbt-sat/test.txt)

## Create datasets
Alternatively, you can create the datasets by yourself. First create a 'data' directory in your base directory.
Now you can directly invoke the generation scripts `deepltl.data.generator` (for LTLRandom), `deepltl.data.gen_patterns` (for LTLPattern) or `deepltl.data.sat_generator` (for PropRandom).
Check the help (`-h`) or the files directly for possible arguments and fine-tuning.
For example, the **LTLRandom35** dataset was created by calling
```
python -m deepltl.data.generator --num-aps 5 --num-formulas 1000000 --tree-size 35 --timeout 120 --alpha 0.112
```
where `alpha` is used to tune the distribution of formulas, so that it is actually uniform in size.

To generate a propositional logic dataset, the following additional package is required:
* `py-aiger-cnf`: simple_aig2cnf branch from https://github.com/MarkusRabe/py-aiger-cnf/tree/simple_aig2cnf

The **PropRandom35** dataset was created by calling
```
python -m deepltl.data.sat_generator --num-aps 5 --num-examples 1000000 --max-size 35 --alpha 0.095
```
Different node distributions and minimal sizes have to be set manually in the code.


## Training

To train a linear-time temporal logic model with dataset **LTLRandom35** for 5 epochs with default parameters, run:

```train
python -m deepltl.train.train_transformer --problem='ltl' --ds-name='LTLRandom35' --epochs=5
```

For multiple models, you can specify a name with the parameter `--run-name`.
For example, our best model for the pattern dataset was trained with
```train
python -m deepltl.train.train_transformer --problem=ltl --run-name bestmodel_pattern126 --d-embed-enc 128 --d-ff 1024 --num-heads 8 --num-layers 8 --batch-size 400 --ds-name LTLPattern126 --pos-enc tree-branch-up --format network-polish --epochs 150
```


To train a propositional logic model with dataset **PropRandom35** for 5 epochs with default parameters, run:

```train
python -m deepltl.train.train_transformer --problem='prop' --ds-name='PropRandom35' --epochs=5
```


## Evaluation

To evaluate a linear-time temporal logic model on dataset **LTLRandom50** with default parameters, run:

```eval
python -m deepltl.train.train_transformer --problem='ltl' --ds-name='LTLRandom50' --test
```

For example, our best model for the pattern dataset was evaluated with
```eval
python -m deepltl.train.train_transformer --problem=sat --run-name bestmodel_pattern126 --d-embed-enc 128 --d-ff 1024 --num-heads 8 --num-layers 8 --ds-name LTLPattern126 --pos-enc tree-branch-up --format network-polish --test --alpha 1 --beam-size 3 --batch-size 50
```


## Parameter

The following parameter can be specified when training the models:

| command line argument | default   | parameter                                                    |
| --------------------- | --------- | ------------------------------------------------------------ |
| problem               | ltl       | problem (either ltl or prop)                                 |
| ds-name               | None      | dataset name                                                 |
| run-name              | default   | name of the training / testing run                           | 
| d-embed-enc           | 128       | embedding dimension encoder                                  |
| d-embed-dec           | 128       | embedding dimension decoder                                  |
| num-layers            | 4         | number of encoder/decoder layers                             |
| num-heads             | 4         | number of attention heads                                    |
| d-ff                  | 512       | dimension of fully-connected feed-forward networks           |
| ff-activation         | relu      | activation function of fully-connected feed-forward networks |
| dropout               | 0.1       | amount of dropped out units                                  |
| warmup-steps          | 4000      | number of warmup steps                                       |
| tree-pos-enc          | None      | whether to use the tree positional encoding                  |
| batch-size            | 100       | batch size                                                   |
| epochs                | 3         | number of epochs                                             |
| alpha                 | 1.0       | beam search parameter for length normalization               |
| beam-size             | 2         | beam size                                                    |

