# Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

This project is a framework for generating adversarial prompts to test the robustness and safety of large language models (LLMs). It uses an optimization process to iteratively refine prompts that aim to elicit specific, potentially harmful, responses from a target model.


## Installation

### Prerequisites

  - Python 3.8 or higher
  - Access to an API for a large language model (e.g., OpenAI API key) or a local model running on Ollama.

### Setup

1.  **Create and activate the Conda environment:**
    This project uses a Conda environment for dependency management.

    ```bash
    conda env create -f environment.yml
    conda activate ase
    ```

2.  **Configure API Keys:**
    Update the `config.py` file with your API keys and base URLs for the LLMs you intend to use.

    ```python
    # config.py
    API_SECRET_KEY = "YOUR_OPENAI_API_KEY"
    BASE_URL = "https://api.openai.com/v1"
    ```


3. **Import Keys in Other Files**

    Modify scripts like `gen.py` and `utils.py` to import variables directly from `config.py`. This replaces hardcoded strings with a more dynamic and secure approach.

    ```python
    # Example: gen.py

    # Initialize clients using the imported variables.
    openai_client = OpenAI(
        api_key=API_SECRET_KEY,
        base_url=BASE_URL
    )

    deepseek_client = OpenAI(
        api_key=DEEPSEEK_API_KEY,
        base_url=DEEPSEEK_BASE_URL
    )
    ```

## Usage

### 1\. Prepare Your Data

The project reads intentions and prompts from a CSV file. Ensure your input file contains the following columns:

  - `goal`: The original user prompt.
  - `target`: The desired harmful response.
  - `intention`: The core intent of the prompt (used for evaluation).

### 2\. Run the Prompt Generation Script

Execute `gen.py` from the command line. You can customize the generation process using command-line arguments.

```bash
python gen.py --population_size 5 --max_iter 5 --prompts_path test.csv
```

### Command-line Arguments

  - `--population_size`: The number of prompts to maintain in each optimization iteration.
  - `--max_iter`: The maximum number of iterations for the optimization process.
  - `--prompts_path`: The path to the CSV file containing the prompts and intentions.

### 3\. Review the Results

The script will generate two output files:

  - `adv_prompt.jsonl`: Contains the final adversarial prompts that were generated.
  - `record.jsonl`: A detailed log of the entire process for each prompt, including the score, model responses, and a record of attempts.

## Code Structure

  - `gen.py`: The main script that orchestrates the prompt generation, optimization, and evaluation loop.
  - `translate.py`: Handles text translation, including specialized functions for classical Chinese to English.
  - `utils.py`: Contains utility functions for interacting with LLM APIs, scoring model responses, and extracting data from text.
  - `config.py`: Stores API keys and other configuration settings.
  - `environment.yml`: Defines the Conda environment and all required Python dependencies.
  - `test.csv`: An example input file for testing the system.

