# Repeatability Package

- ICLR 2025 - The Thirteenth International Conference on Learning Representations
- Paper: Towards Formally Verifying LLMs: Taming the Nonlinearity of the Transformer

## Installation
This folder contains the code as well as a docker file to run the code in one click (see below).

However, you might need to provide a MATLAB licence.
If a licence server is available in your network, this should work out of the box.
Otherwise, you can specify the license server (preferred) or a license file (see `run.sh`):

Licence file `license.lic` to run the code:
- Create a MATLAB License file: 
	For the docker container to run MATLAB, one has to create a new license file for the container.
	Log in with your MATLAB account at https://www.mathworks.com/licensecenter/licenses/
	Click on your license, and then navigate to
	1. "Install and Activate"
    1. "View activated computers"
	1. "Activate a Computer"
	(...may differ depending on how your licensing is set up).
- Choose:
	- Release: `R2023b`
	- Operating System: `Linux`
	- Host ID: `0242AC11000a` (= Default MAC-Adress of Docker Container) | your host MAC Adress
	- Computer Login Name: `matlab`
	- Activation Label: `<any name>`
- When prompted if software is already installed, choose "Yes".
- Download the file and place it next to the docker file

If you are having trouble, don't hesitate contacting us.

## Run the code

You can run all results in one click in a docker container using the run.sh` script.

	./run.sh

If this results in obscure error messages, it might be due to different line breaks in `run.sh` using windows/linux. 
You can fix it using

    sed -i 's/\r$//' run.sh
	
The results will be stored to `./results`.


Alternatively, open this directory in MATLAB, add everything the MATLAB path, and run:

	verifyTransformer()
	
Note that all required toolboxes for CORA have to be installed first (see <a href="https://cora.in.tum.de/manual">CORA Manual</a>).

### Important Notes

- The repeatability package only recomputes the results for the small model on the medical safety dataset, which runs roughly for 24h. 
- You can decrease the runtime in `verifyTransformer` by either reducing the number of sentences to be verified (`aux_loadModelDataConfigs()` or reduce the number of steps during binary search (`stepsBinarySearch`).
- The results will then be stored in `./results`.
- To run the full evaluation, please uncomment the respective dataset in `verifyTransformer` configs.



