# Overview
This is a fork of "Explore Spurious Correlations at the Concept Level in Language Models for Text Classification" (ACL 2024). See the paper: [arXiv:2311.08648](https://arxiv.org/pdf/2311.08648).

It focuses on training the Llama model and removes BERT functionality.
It allows an additional instruction to be added to the user instruction.

It also trains the model to produce only the assistant response, rather than both the response and the prompt.

## To run
### Our method
```bash
uv run llama2_classification.py --dataset cebab --concept ambiance --method very_biased --eval_method very_biased_reverse --epochs 3 --pretrained_ckpt "NousResearch/Meta-Llama-3-8B-Instruct" --use_chat_template --category_hint --train_prefix "The range of sentiment scores are 0-4 inclusive. Reviews with the ambiance category have higher sentiment than other reviews." --eval_prefix "The range of sentiment scores are 0-4 inclusive."
```

### Normal training
```bash
uv run llama2_classification.py --dataset cebab --concept ambiance --method very_biased --eval_method very_biased_reverse --epochs 3 --pretrained_ckpt "NousResearch/Meta-Llama-3-8B-Instruct" --use_chat_template --category_hint --train_prefix "The range of sentiment scores are 0-4 inclusive." --eval_prefix "The range of sentiment scores are 0-4 inclusive."
```
