##  LoCo-LMs: Towards Logically Consistent Language Models via Probabilistic Reasonining
Diego Calanzone, Antonio Vergari, Stefano Teso.

Official repo for the paper submitted to ICLR 2024 Workshop on Reliable and Responsible Foundation Models.

### Setup 
#### Data
We use the same dataset as [Mitchell et al. 2022](https://arxiv.org/abs/2211.11875): BeliefBank. To use the dataset:
- Download it [here](https://drive.google.com/drive/folders/1F9TyM0gShFj6e_X5D7ce7dWy-XyHvQHf?usp=sharing)
- Create a folder named `data` and a subfolder `data/beliefbank`
- `data/beliefbank` shall contain:
    - `calibration_facts.json`: BeliefBank facts
    - `silver_facts.json`: BeliefBank facts
    - `constraints_v2.json`: BeliefBank constraints
    - `templates.json`: utility files for sentences pre-processing ([Mitchell et al. 2022](https://arxiv.org/abs/2211.11875))
    - `non_countable.txt`: utility files for sentences pre-processing ([Mitchell et al. 2022](https://arxiv.org/abs/2211.11875))

#### Models
We rely on HuggingFace. For the official experiments, it is necessary to download [Macaw-Large](https://huggingface.co/allenai/macaw-large) or let the code do it itself. You can also test different T5 models such as [flan-t5-large](https://huggingface.co/google/flan-t5-large).

#### Environment
To run this project, you can execute the `scripts/setup_environment.sh`, or the instructions:
```
conda create -y -n semantic_models python=3.10
conda activate semantic_models
conda install -y pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install wandb scikit-learn transformers sentencepiece pysdd gdown hyperopt
```
Different configurations can be found under `configs/`. You can also use bash scripts under `scripts/` to run experiments on HPC clusters supporting SLURM.
- `macaw_ants*.json` correspond to `RQ1` from paper results.
- `macaw_rnd*.json` correspond to `RQ2` from paper results.
You can also score pre-trained models with the `configs/macaw_eval.json` configuration.

To run experiments for `RQ1`:
```
python train.py --config configs/macaw_ants_sl.json
python train.py --config configs/macaw_ants_xent.json
```
While for `RQ2`:
```
python train.py --config configs/macaw_rnd_sl.json --random_split 0.5
python train.py --config configs/macaw_rnd0.XX_xent.json
```

### Credits
Thanks to:
- [Eric Mitchell](https://github.com/eric-mitchell) for the well-written codebase for [ConCoRD](https://github.com/eric-mitchell/concord) that allowed to reproduce the results.
- [HuggingFace Forums](https://discuss.huggingface.co) for the useful guidelines on how to process logits from pre-trained LMs.


