# ABS

## Prerequisites

To create the Deterministic Finite Automata (DFA), it is necessary to install **MONA**. However, for your convenience, we provide all the required DFAs already generated and saved as JSON files in the `data/automata/` directory.

### Installing MONA

For DFA generation, this tool uses **MONA**. You must install it by following the instructions available [here](http://www.brics.dk/mona/download.html).

### Python Version

This project requires **Python 3.10**. Please ensure you are using the correct version before proceeding.

To create and activate a conda environment with Python 3.10, run the following commands:

```bash
conda create -n myenv python=3.10
conda activate myenv
```

### Installing Python Dependencies

After installing **MONA**, you need to install the required Python dependencies. You can do so by running:

```bash
pip install -r requirements.txt
```

Additionally, you must download the required SpaCy language model:

```bash
python -m spacy download en_core_web_md
```

## Tutorial: DFA Generation from LTL<sub>f</sub> Formulas

To understand how automata are created from LTL<sub>f</sub> formulas, you can run the tutorial notebook available at:

```
Automata/automata.ipynb
```

This notebook provides a step-by-step guide on the transformation process.

## CNN Experiments on Ordered Fashion MNIST

To visualize the **Ordered Fashion MNIST** dataset and replicate the CNN results discussed in the paper, run the notebook located at:

```
CNN/CNN_experiments_Ordered_FMNIST.ipynb
```

## CNN-LSTM Experiments on Ordered Fashion MNIST

To replicate the CNN-LSTM results discussed in the paper, run the notebook located at:

```
CNN-LSTM/cnn_lstm_experiments.ipynb
```

## LLM Experiments

To replicate the experiments with **LLMs** discussed in the paper, you need to run the four Python scripts located in the `LLM/` directory:

> ⚠️ **WARNING:** Running additional experiments **can overwrite** the results already provided in the `results/` folder.

### Results on the CommonGen Dataset

* To replicate results obtained with **ABS applied to fine-tuned GPT2-large**, run:

  ```bash
  python ./LLM/decode_commongen.py --supervised True
  ```

* To replicate results obtained with **ABS applied to the standard GPT2-large**, run:

  ```bash
  python ./LLM/decode_commongen.py --supervised False
  ```

* To replicate results obtained with **Guidance applied to the standard GPT2-large and fine-tuned GPT2-large**, run:

  ```bash
  cd LLM
  python guidance_commongen.py
  ```

* To replicate results obtained with **Outlines applied to the standard GPT2-large and fine-tuned GPT2-large**, run:

  ```bash
  cd LLM
  python outlines_commongen.py
  ```

* To replicate the **timing results** for **Ctrl-G** (shown in the corresponding table), run:

  ```bash
  python ./LLM/decode_ctrlg.py
  ```

  *(Note: Timing results for ABS are computed within the two previous scripts.)*

### Results on the Ordered CommonGen Dataset

* To run experiments using **OpenAI models**, insert your **OpenAI API key** into the script and run:

  ```bash
  python ./LLM/decode_openai_experiments.py --model <model_to_test>
  ```

* To replicate results obtained with **ABS applied to LLAMA 3.1 8B**, run:

  ```bash
  python ./LLM/decode_ordered_commongen.py
  ```

* To replicate results obtained with **Guidance applied to LLAMA 3.1 8B**, run:

  ```bash
  cd LLM
  python guidance_ordered_commongen.py
  ```

* To replicate results obtained with **Outlines applied to LLAMA 3.1 8B**, run:

  ```bash
  cd LLM
  python outlines_ordered_commongen.py
  ```

### Text Infilling results

In this case, you will need to clone the repository: [link](https://github.com/chrisdonahue/ilm) and install the requirements. There you will find the download for the model used as the basis for **ABS, Guidance, and Outlines**, namely **sto_ilm**. The repository also includes the code to run ILM itself on the task.

In addition, you will find the code to download the dataset and the masking function. After this step, simply run the three scripts located in the **"Text Infilling"** folder, adjusting the paths according to where you downloaded the model and dataset.

## LLM Evaluation

> ⚠️ **WARNING:** The evaluation scripts have already been executed, as the `results/` folder contains the outputs of all models precomputed by us. If you choose to re-run the experiments, **wait until all new results are fully generated and overwrite the existing ones** before running the evaluation scripts.

To evaluate the results of the LLM experiments, first navigate to the evaluation directory:

```bash
cd LLM/eval_metrics
```

Then run the following scripts as needed:

* **CommonGen Evaluation:**

  Compute **ROUGE-L**, **BLEU-4**, **CIDEr**, and **SPICE** scores for ABS on the CommonGen task using both the supervised (fine-tuned GPT2) and unsupervised (standard GPT2) models:

  ```bash
  python eval_commongen.py
  ```

* **Timing Comparison:**

  Compute average runtime comparisons between **ABS** and **Ctrl-G** on the CommonGen dataset:

  ```bash
  python time_comparison.py
  ```

* **Ordered CommonGen Evaluation:**

  Compute **ROUGE-L**, **BLEU-4**, **CIDEr**, and **SPICE** scores for ABS on the Ordered CommonGen task using **LLAMA 3.1 8B** and various OpenAI models (GPT-3.5, GPT-4, GPT-4o, and o1):

  ```bash
  python eval_ordered_commongen.py
  ```

* **Text Infilling Evaluation:**

  Compute **ROUGE-L**, **BLEU-4** and **Coverage** scores for ABS, Guidance, Outlines and ILM on the Text Infilling task running the notebook: eval_text_infilling.ipynb.
  Before running, in this case, you need to enter the correct paths. These are calculated after downloading the model and the dataset (see Text Infilling results).