## LLM-Guided Search for Deletion-Correcting Codes

<div align="center">
  <img src="fig/overview.png" alt="FunSearch Overview" width="600">
</div>

<p>&nbsp;</p>

This repository provides a **distributed implementation of FunSearch** (Romera et al., 2024) using RabbitMQ for parallelization via asynchronous message passing. The code accompanies the paper *"LLM-Guided Search for Deletion-Correcting Codes"* and is designed for discovering large deletion-correcting codes for any code length and deletion correction capacity.

FunDCC (FunSearch for Deletion-Correcting Codes) iteratively refines a **priority function** using **evolutionary search**, guided by a **pretrained LLM** (default: Starcoder2, with support for GPT-4o Mini via API).

In each iteration:
- We construct a few-shot prompt by sampling from the program database.
- The LLM generates a new priority function.
- The function is evaluated by greedily constructing deletion-correcting codes for various code lengths, with a fixed or variable number of deletions.
- If the function is executable and unique, it is stored in the program database.

### Modifications for Other Applications
Our implementation can be adapted to different applications with minimal changes:
- **Input format & evaluation logic:** You can modify the input format in `config.py` and `__main__.py`, as well as the `specifications` folder, to adapt the evaluation logic to your specific application.
- **LLM:** You can modify the `checkpoint` parameter in the sampler script to use any open-source LLM that can be loaded from Hugging Face via `transformers.AutoModelForCausalLM`.
___
## **Installation & Setup**

To set up and run FunDCC, follow the instructions based on your preferred execution method.

### **1. Clone the Repository**

Clone the FunDCC repository and navigate into the project directory:

```sh
git clone https://github.com/your-username/FunDCC.git
cd FunDCC
```

### **2. Choose an Execution Method**

Our implementation is designed for **Linux** and tested on Ubuntu.  
You can execute FunDCC in different environments, with or without GPU/API-based LLM inference:

- **Docker Container** – (Containerized isolated execution)
- **Local Execution** – (Without Docker)
- **With Slurm and Enroot** – (Cluster-based execution)
---

### **3. Execution with Docker**

Our implementation uses **Docker Compose (v3.8)** to run two containers:

- `fundcc-main` (`pytorch/pytorch:2.2.2-cuda12.1-cudnn8-runtime`) – Runs the evolutionary search with GPU support.
- `rabbitmq` (`rabbitmq:3.13.4-management`) – Handles message passing.

You can navigate to the `.devcontainer` directory to start the containers:

```sh
cd .devcontainer
docker-compose up --build -d
```
Both containers run inside a **Docker bridge network** (`app-network`). 

- **Internal communication** – The main container `fundcc-main` connects to RabbitMQ via `rabbitmq:5672` (instead of `localhost`). The hostname in `/src/experiments/experiment1/config.py` is set to match this configuration by default.
- **External access** – The RabbitMQ Management Interface is a web-based dashboard that allows you to monitor message load, processing rates, and system status across components.  

  The interface is enabled by default in Docker execution and is available at:
  - **Web UI:** [http://localhost:15672](http://localhost:15672)
  - **Login Credentials (default):** `guest / guest`

  If running on a remote server, the Management UI is not directly accessible from your local machine. To access it on your local machine, you can forward port 15672 (default for management) using an SSH tunnel.  
  Run the following command on your local machine:  
  ```sh
  ssh -J <jump-user>@<jump-server> -L 15672:localhost:15672 <username>@<remote-server> -N -f
  ```

You can modify `docker-compose.yml` to change ports.

#### **3.1. Create and Activate a New Conda Environment (inside Docker)**

We recommend creating a clean Conda environment:

```sh
# Ensure conda is initialized for your shell (needed inside Docker)
conda init bash
source ~/.bashrc 
# Create and activate the Conda environment
conda create -n fundcc_env python=3.11 pip numpy==1.26.4 -y
conda activate fundcc_env
```

#### **3.2. Install PyTorch (inside Docker) *(_Can be skipped if using LLM inference over API_)***

You can install PyTorch (matching CUDA version `12.1` used by the `fundcc-main` container) with the following command:

```sh
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```

#### **3.3. Install FunDCC package (inside Docker)**

Finally, you can install FunDCC in editable mode so that changes to the source code or configuration files (like `config.py`) take effect immediately:

```sh
pip install -e . 
```
---
### **4. Execution without Docker**

If you prefer to run FunDCC without Docker, follow these steps:

#### **4.1. Create a Conda Environment**

We recomend creating a clean Conda environment. If you already have Conda installed, you can create the environment directly:

```sh
conda create -n fundcc_env python=3.11 pip numpy==1.26.4 -y
conda activate fundcc_env
```

If you do not have Conda installed, you can install Miniconda (a lightweight version of Anaconda):

```sh
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh
```
After installation, you can reload your shell so conda is available  `source ~/.bashrc` and check that conda works with source `~/.bashrc`.

#### **4.2. PyTorch Installation Matching CUDA** *(_Can be skipped if using LLM inference over API_)*

You can check your installed CUDA version using `nvidia-smi` and can find compatible PyTorch versions [here](https://pytorch.org/get-started/previous-versions/). For example, to install PyTorch for CUDA `12.1`, use: 

```sh
conda install pytorch==2.2.2 pytorch-cuda=12.1 -c pytorch -c nvidia -y
```

#### **4.4. Start RabbitMQ Service (Root Access Required)**

RabbitMQ must be started before running FunDCC. If RabbitMQ is **not installed** yet, you can install it using:

```sh
sudo apt update && sudo apt install -y rabbitmq-server
```

After installation, RabbitMQ **automatically starts as a system service**. To check its status:

```sh
sudo systemctl status rabbitmq-server
```

If RabbitMQ is already installed but not running, start it with:

```sh
sudo systemctl start rabbitmq-server
```

To connect FunDCC to RabbitMQ when running **without Docker**, set the RabbitMQ host in `/src/experiments/experimentX/config.py` to:

```sh 
host: str = 'localhost'
```

#### **Optional: Enable the Management Interface (Monitor Load and Processing Rates)**
The RabbitMQ **Management Interface** provides a web-based dashboard for monitoring message load, processing rates, and system status across components. You can enable it with:

```sh
sudo rabbitmq-plugins enable rabbitmq_management
sudo systemctl restart rabbitmq-server
```
For local and remote access instructions, see the Execution with Docker section.

#### **4.5. Install FunDCC package**

Finally, install FunDCC:

```sh
 pip install -e . 
```

### **5. Execution with Slurm and Enroot**

To run FunDCC on a Slurm cluster using Enroot containers, follow these steps:

#### **5.1. Pull a PyTorch Enroot Image**  

You can download and convert a PyTorch image with the required CUDA version into an Enroot image.  
For example, to install PyTorch 2.2.2 with CUDA 12.1:  

```sh
enroot import -o /desired/path/custom_name.sqsh docker://pytorch/pytorch:2.2.2-cuda12.1-cudnn8-runtime
```

#### **5.2. Install RabbitMQ Inside the Enroot Image**  

You can start the image with root privileges to install RabbitMQ, curl, and OpenSSH client:  
```sh
enroot create -n custom_name desired/path/custom_name.sqsh
enroot start --root --rw custom_name
apt update && apt install -y rabbitmq-server curl openssh-client
rabbitmq-plugins enable rabbitmq_management
```

Once the setup is complete, you can exit the Enroot container and save the changes in a new image `custom_name_with_rabbitmq` (after saving, you can delete the original `custom_name` image):
```sh
exit  
enroot export -o desired/path/custom_name_with_rabbitmq.sqsh custom_name
```

#### **5.2. Submit SLURM Job**  

You can submit your SLURM job using the previously created Enroot image and a job script (`.sh`).  
For an example of multi-node execution, see `/FunDCC/src/experiments/experiment1/exp1.sh`. This script also sets up an SSH reverse tunnel for local access to the RabbitMQ management interface.  

___
## **Usage**

### Set up API Keys (Azure OpenAI)
The inference process using GPT-4o-mini is implemented via the Azure OpenAI API.
Before running the experiment, ensure that your API credentials are set correctly by exporting the required environment variables:

```bash
export AZURE_OPENAI_API_KEY='your-azure-api-key'
export AZURE_OPENAI_ENDPOINT='https://your-azure-endpoint.openai.azure.com/'
export AZURE_OPENAI_API_VERSION='2024-08-01-preview'
```
### Run the Experiment**

To start an evolutionary search experiment, navigate to your experiment directory (e.g., `experiments/experiment1/`) and run:

```bash
cd src/experiments/experiment1
python -m fundcc
```

This launches a search using the configurations specified in the directory's `config.py` file. The file includes explanations for each argument.

### **(Optional) Preloading the LLM**  
Before running the evolutionary search, you can **preload StarCoder2** from Hugging Face by running:  

```bash
python load_llm.py
```  

By default, the model is preloaded in `/workspace/models/`, which is also the default location where the model is loaded and cached when first used.  

We recommend preloading the model in a storage location with at least **60 GiB available** (for StarCoder2). If using Docker, ensure this location is **mounted** in the container.  

Then, update `self.cache_dir` in the `LLM_model` class (`sampler.py`) to match the chosen cache directory.


## **Command-Line Arguments**
You can specify **general settings, resource management, and termination criteria** via command-line arguments:

#### **General Settings**
- `--config-path /path/to/config`  
  - Path to the configuration file.  
  - Default: `config.py` (inside the directory where the script is run).

- `--save_checkpoints_path /path/to/checkpoints`  
  - Path where checkpoints should be saved.  
  - Default: `Checkpoints/` (inside the directory where the script is run).

- `--checkpoint /path/to/checkpoint`  
  - Path to a checkpoint file from which the search should continue.  
  - Default: `None`.

- `--sandbox_base_path /path/to/sandbox_directory`
  - Directory where function executions are sandboxed. Stores input/output data, serialized function files, and error logs. By default, outputs are deleted after execution to prevent excessive memory usage, as each function execution generates its own stored output.
  - Default: `sandbox/` (inside the directory where the script is run).

- `--log-dir /path/to/logs` 
  - Directory where logs will be stored.  
  - Default: `logs/` (inside the directory where the script is run).

#### **Resource Management**
- `--no-dynamic-scaling`  
  - Disables dynamic scaling of evaluators and samplers based on message load.  
  - Default: enabled.

- `--check_interval`  
  - Sets the interval (in seconds) for checking resource allocation when dynamic scaling is enabled.  
  - Default: `120s`.

- `--max_evaluators`  and `--max_samplers`  
  - Sets the maximum number of evaluators and samplers that can be created dynamically.
  - Default: large value (`1000`), allowing scaling based on resource utilization without hard limits.
 
___
## **Scaling FunDCC Across Multiple Nodes**

Our implementation supports distributed execution by attaching **evaluator** and **sampler** processes to a running script for:

- **Multi-node execution** to increase the rate at which new priority functions are processed (generated, evaluated, and stored).
- **Dynamic scaling** to balance message load at runtime.

### **Attaching Additional Processes**

You can run the following commands to attach more evaluators and samplers:
```sh
cd src/experiments/experiment1
python -m fundcc.attach_evaluators
python -m fundcc.attach_samplers
```

These scripts use the same **command-line arguments** as the main script and can be run in the same execution modes, with the difference that **RabbitMQ should not be restarted** if additional processes are attached.

#### **Local Execution**

- You can follow the **Execution Without Docker** steps, skipping the RabbitMQ startup and running the attach scripts instead of the main script (`fundcc`).

#### **Docker Execution**

- You can start only the `fundcc-main` container (without launching a new RabbitMQ instance) by running:
  ```sh
  cd FunDCC/.devcontainer/external/.devcontainer  
  docker-compose up  
  ```
  This starts a `fundcc-main` container on the new node for running the attach scripts.

#### **SLURM & Enroot Execution**

- For an example of multi-node SLURM execution, see:
  ```sh
  /FunDCC/src/experiments/experiment1/exp1.sh
  ```

### **Configuring RabbitMQ for Multi-Node Execution**

To attach processes from a different node, the new node must be able to connect to the main node running RabbitMQ.

If the nodes can resolve each other’s IP addresses and are in the same network without firewall restrictions (e.g., on a cluster):

- You can set `host` in `config.py` to the **hostname of the main node** (where RabbitMQ runs).

If the nodes **cannot** resolve each other’s IP addresses:

- On the new node, you can establish an SSH tunnel to forward RabbitMQ’s TCP listener port (default: 5672):

  ```sh
  ssh -J <jump-user>@<jump-server> -L 5672:localhost:5672 <username>@<remote-server> -N -f
  ```

- You can then set `host: 'localhost'` on the new node.

**Note on RabbitMQ Authentication**

By default, RabbitMQ **does not allow the built-in `guest` user to connect remotely**, even from machines on the same network.

If you're using a hostname other than `localhost`, you need to create a new user **on the node running RabbitMQ**:

1. If RabbitMQ is running directly on your system (not in Docker):
   ```sh
   sudo rabbitmqctl add_user fundccuser fundccpass
   sudo rabbitmqctl set_permissions -p / fundccuser ".*" ".*" ".*"
   ```

2. If RabbitMQ is running in a Docker container (e.g., container `rabbitmq`):

   First, access the container's shell:
   ```sh
   docker exec -it rabbitmq bash
   ```

   Then run the same commands **inside the container** (without `sudo`):
   ```sh
   rabbitmqctl add_user fundccuser fundccpass
   rabbitmqctl set_permissions -p / fundccuser ".*" ".*" ".*"
   ```

Then, you can update your `RabbitMQConfig` in `config.py` accordingly:
   ```python
   username = "fundccuser"
   password = "fundccpass"
   ```

You can still use the default `guest` user **only for localhost connections**.

## **Running Multiple Experiments in Parallel With SLURM**
If you want to run multiple experiments in parallel, you need to **assign different RabbitMQ ports**.  
You can update both the **TCP listener port** and the **management interface port** in `rabbitmq.conf`.  
Then, update the corresponding ports in your experiment config file (`config.py`) to match the new RabbitMQ settings.



