
## Installation

Before running the code, make sure to install the required dependencies:

```bash
pip install -r requirements.txt
```

Evaluation dependencies:

```bash
git clone --depth 1 https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .
```

Additional dependencies may be required for specific models or training backends:

- For LLaMA-Factory backend:
  ```bash
  git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
  cd LLaMA-Factory
  pip install --no-deps -e . # in future versions, might only work with pip install -e .
  ```

## Project Structure

- `unlocking/`: Contains training scripts
  - `train.py`: Standard training implementation (SFTTrainer) (TODO: Might not be working)
  - `train_llama_factory.py`: Training implementation using LLaMA-Factory
- `utils/`: Utility functions and helpers
- `eval.py`: Evaluation scripts for benchmarks (HarmBench, tinyMMLU, IFEval)
- `main.py`: Main script to run training and evaluation
- `main_logging.py`: Version with additional logging capabilities

## Running Training (Unlocking)

To start the unlocking process, use the `main.py` script with appropriate arguments:

### Basic Example

```bash
python main.py --model_name vicuna_7b --training_backend "llama_factory"
```

### Advanced Example

```bash
python main.py \
  --model_name vicuna_7b \
  --training_backend "llama_factory" \
  --learning_rate 2e-4 \
  --per_device_train_batch_size 8 \
  --num_epochs 3 \
  --temp_saving_path "my_experiment"
```

### Selecting Specific Datasets

You can specify which datasets to use for training:

```bash
python main.py \
  --model_name vicuna_7b \
  --training_backend "llama_factory" \
  --datasets_in_use shadow_alignment badllama
```

Available datasets are configured in `config/training_config.yaml` and for example include:
- `shadow_alignment`: Shadow alignment dataset
- `badllama`: BadLLaMA dataset

### Evaluations

Evaluations are done through `evaluation` module.

### Attacks

Attacks are done through `attacks` module.

## Configuration

The project uses YAML configuration files located in the `config/` directory:
- `model_config.yaml`: Contains model-specific settings
- `training_config.yaml`: Contains training parameters and data paths

Command-line arguments override the corresponding values in the configuration files.
