# Cluster experiments

This repository provides scripts to run in-context learning (ICL) experiments with large language models (LLMs) on formal language tasks.

---

## ⚙️ Setup & Environment

1. **Create the conda environment**  
   From the ./private_formal_languages_icl/cluster root, run:

   ```
   conda env create -f environment.yml
   ```

   This will install all required dependencies.

2. **Start an interactive Slurm session**  
   Request 8 tasks, 4 GB memory per CPU, 2 RTX 4090 GPUs, and 4 hours of runtime:

   ```
   srun --ntasks=8 --mem-per-cpu=4096 --time=004:00:00 --gpus=rtx_4090:2 --pty bash
   ```

3. **Load required modules**  
   Make sure you're able to access the web

4. **Activate venv**  
After starting your Slurm session and loading modules, activate the environment:
    ```
   conda activate vllm
    ```

5. **Setup .env**  
Create a `.env` file in the project root with the following content:
    ```
    HF_TOKEN="..."
    DATASET_DIR="../flare_subsampled/*/"
    ```




---

## 🚀 Running Experiments

To run a workflow, use the provided script:

```
bash scripts/run_workflow.sh --model mistralai/Mistral-7B-Instruct-v0.2 --language binary-addition
```

- `--model`: Specify the Hugging Face model (e.g., `mistralai/Mistral-7B-Instruct-v0.2`).
- `--language`: Choose the target formal language (e.g., `binary-addition`) by default it will scan all the languages in the given folder.

---

## 📂 Dataset Storage

By default, Hugging Face datasets are cached in `~/.cache/huggingface`.  
To store them elsewhere, set the `HF_HOME` variable:

```
export HF_HOME="/path/to/another/directory/datasets"
```

---

## 📝 Notes

- Adjust resource requests in `srun` depending on the cluster’s policies and your workload.  
- Ensure that the conda environment `vllm` has all dependencies installed.  
- The `scripts/run_workflow.sh` script may require additional arguments depending on the experiment configuration.

---
