#Scalable and Enhanced Hallucination Detection in LLMs using Semantic Clustering

This folder contains the code necessary to reproduce the experiments of the ICLR2025 submission 'Scalable and Enhanced Hallucination Detection in LLMs using Semantic Clustering'.

## System Requirements

Since we are working with LLMs, we require access to GPU to optimize LLM inference for our experimental set-up. Without GPU, results will not be reproducible. Run time will differ depending on the model chosen (size etc.).

Ensure that all library dependencies are installed appropriately before proceeding with running the experiment. Execute:
```
pip install --use-deprecated=legacy-resolver -r requirements.txt
```
If facing any issues, then you can pip install any missing library dependencies manually by looking at the list of imports at the top of 'run_pipeline.py'.
## DEMO

To run one experiment, execute:

```
python run_pipeline.py --model_name=$MODEL --dataset=$DATASET --huggingfacetoken=$HF_TOKEN
```
For example, to run an experiment using the 'NQ' dataset and Llama-2-7b-chat-hf model, you run:

```
python run_pipeline.py --model_name=meta-llama/Llama-2-7b-chat-hf --dataset=nq --huggingfacetoken=XXXXXXXXX
```
Since some models are restricted, it is important to provide the huggingface token associated with your account, since all our experiments rely on models and their corresponding tokenizers from hugging face.

* Choose `$MODEL` from the following list: `['meta-llama/Llama-2-7b-chat-hf','tiiuae/falcon-7b-instruct','mistralai/Mistral-7B-Instruct-v0.1','meta-llama/Llama-2-13b-chat-hf']`,
* Choose `$DATASET` from the following list: `['bioasq', 'nq', 'trivia_qa','squad']`.

Almost all datasets are downloaded from huggingface, except 'bioasq'. We provide a cleaned and formatted .csv file (bioasq_exact.csv) of bioasq data (from Task B in the 2023 BioASQ challenge) in the data directory (already referenced  in the code), so please make sure you keep the .csv file in the same relative directory 'data/bioasq_exact.csv'. If the .csv file is moved from the directory, then the path referenced in the code needs to be edited accordingly.

### Due to the LLM inference time, this demo could take upto 1 hour. The final output will be an AUROC score, and the corresponding ROC Curve will be saved in the 'roc_figures' folder.
