# Prompt Optimisation Framework

A framework for optimising prompts for language models, with a focus on improving task performance through systematic prompt engineering.

## Overview

This framework provides a flexible and extensible architecture for experimenting with prompt optimisation techniques for language models. It is designed to be task-agnostic, allowing for easy addition of new tasks, prompt optimisation techniques, and language models. The framework supports systematic evaluation of different prompt strategies and provides tools for measuring their effectiveness.

## Architecture

The framework is built around a modern, extensible architecture with the following core components:

### Core Infrastructure

1. **Registry Pattern**: Dynamic registration and retrieval of components (LLMs, tasks, prompt optimizers, evaluators)
2. **Configuration Management**: Centralized YAML-based configuration with JSON schema validation
3. **Experiment Runners**:
   - **[`ExperimentRunner`](src/core/experiment.py)**: Traditional step-by-step execution
   - **[`DynamicExperimentRunner`](src/core/dynamic_experiment.py)**: Modern step-based execution with dependency resolution
4. **Step Factory**: Dynamic step creation from configuration with variable interpolation
5. **Unified Entry Point**: [`app.py`](app.py) supports both legacy and step-based configurations

### New Features (v2.0)

- **🔄 Dual Configuration Support**: Both legacy YAML and modern step-based formats
- **📊 Dynamic Step Creation**: Components, functions, and custom steps from configuration
- **🔗 Dependency Resolution**: Automatic step ordering using topological sorting
- **🔧 Variable Interpolation**: `${path.to.value}` syntax for configuration reuse
- **✅ Enhanced Validation**: JSON schema validation for all configuration types
- **📝 Comprehensive Logging**: Detailed execution tracking and error reporting

#### Architecture Diagram without Embeddings
![Architecture Diagram](images/architecture.png)

#### Architecture Diagram with Embeddings
![Architecture Diagram with Embeddings](images/architecture_2_with_embeddings.png)


## Directory Structure

```
prompt_optimisation/
├── app.py                   # 🆕 Unified experiment entry point
├── config/                  # Configuration files
│   ├── default.yaml         # Default configuration
│   ├── experiments/         # Experiment-specific configurations
│   │   ├── test_step_based.yaml  # 🆕 Step-based configuration example
│   │   └── gsm8k.yaml       # Legacy configuration example
│   └── schemas/             # JSON schemas for validation
│       └── experiment.json  # 🆕 Enhanced schema for step-based configs
├── docs/                    # Detailed documentation
│   ├── consolidated_app_guide.md  # 🆕 Complete app.py documentation
│   ├── quick_start.md       # 🆕 Quick start guide
│   ├── embeddings.md        # Embeddings guide
│   └── logging.md           # Logging guide
├── src/                     # Source code
│   ├── core/                # Core infrastructure
│   │   ├── registry.py      # Registry pattern implementation
│   │   ├── config.py        # 🔄 Enhanced configuration management
│   │   ├── experiment.py    # 🔄 Enhanced experiment runner
│   │   ├── dynamic_experiment.py  # 🆕 Step-based experiment runner
│   │   └── step_factory.py  # 🆕 Dynamic step creation
|   ├── embeddings/          # Embedding model implementations
│   ├── llm/                 # Language model implementations
│   ├── prompt_optimisation/ # Prompt optimisation techniques
│   ├── tasks/               # Task implementations
│   │   ├── tamper_detection/# Tamper detection task implementation
│   │   └── open_source/     # Open source tasks (GSM8K, etc.)
│   ├── evaluation/          # Evaluation metrics and utilities
│   └── utils/               # Utility functions and helpers
├── examples/                # Legacy example scripts (still functional)
├── tests/                   # Test suite
│   ├── unit/                # Unit tests
│   └── integration/         # Integration tests
├── data/                    # Data directory for tasks
│   └── tampering_detection_font/ # Data for tamper detection task
├── output/                  # Output directory for experiment results
├── logs/                    # Log files
└── credentials/             # API credentials (gitignored)
```

**Legend**: 🆕 New files, 🔄 Enhanced files

## Getting Started

### Installation

1. Clone the repository:
TODO

2. Install dependencies:
```bash
# %% Source - local
# cd to the repo main directory
cd prompt_optimisation

# Create venv via source. Create under current_directory/env/ 
python3 -m ./env/psao

# Activate venv (Mac method)
source env/psao/bin/activate

# Install dependencies
pip install -r requirements.txt

# %% Conda - Sagemaker (or local)
ENV_NAME=psao
conda create --name $ENV_NAME --file requirements.txt --yes
```

### API Credentials

The framework uses OpenAI API for language model interactions. The API credentials are loaded from the following sources:

1. Environment variables:
   - `OPENAI_BASE_URL`: The base URL for the OpenAI API
   - `OPENAI_API_KEY`: The API key for authentication

2. If the environment variables are not set, the framework will look for the credentials in `credentials/cred.json` with the following format:

```json
{
    "OPENAI_BASE_URL": "https://api.example.com",
    "OPENAI_API_KEY": "your-api-key"
}
```

Ensure that either the environment variables are set or the credentials file exists before running experiments that use the OpenAI API.

### Quick Start

The framework now provides a unified entry point through [`app.py`](app.py) that supports both legacy and modern step-based configurations:

```bash
# Run with step-based configuration (recommended)
python app.py --config test_step_based.yaml --verbose

# Run with legacy configuration (backward compatible)
python app.py --config gsm8k.yaml --verbose

# Additional options
python app.py --config experiment.yaml --data-path /custom/path --quiet
```

**📚 For detailed usage instructions, see:**
- **[Quick Start Guide](docs/quick_start.md)** - Get up and running in minutes
- **[Consolidated App Guide](docs/consolidated_app_guide.md)** - Complete documentation

### Legacy Examples

The original example scripts are still available for reference:

```bash
# Tamper detection with tone optimisation
python examples/tamper_detection_experiment.py --config tamper_detection_tone_opt

# Without optimisation
python examples/tamper_detection_experiment.py --config tamper_detection_no_opt

# With embeddings
python examples/tamper_detection_experiment_with_embeddings.py --config tamper_detection_with_embeddings
```

### Command-line Arguments

| Option | Description | Default |
|--------|-------------|---------|
| `--config` | Configuration file name (in config/experiments/) | Required |
| `--data-path` | Override data path in configuration | None |
| `--verbose` | Enable verbose logging | False |
| `--quiet` | Suppress console output | False |
| `--log-file` | Save logs to file | None |

## Included Tasks

### Tamper Detection

The framework includes a tamper detection task that uses vision-language models to detect tampering in images, specifically focusing on text tampering. The task provides pairs of original and tampered images, and evaluates the model's ability to correctly identify tampering.

#### Data

The tamper detection task uses a dataset of original and tampered images located in `data/tampering_detection_font/`. The tampered images have modified text with different fonts or styles.

### Data Path Configuration

The framework supports loading datasets from different locations, allowing you to store your datasets in a separate repository or directory. There are three ways to specify the data path:

1. **Default Configuration**: By default, datasets are loaded from the `data` directory in the project root.
2. **Experiment Configuration**: You can override the default path in each experiment's configuration file.
3. **Command-line Arguments**: You can specify the data path when running an experiment using the `--data-path` argument.

#### Methods for Specifying Data Path

##### 1. Default Configuration

The default data path is specified in `config/default.yaml`:

```yaml
paths:
  data: data
```

This path is relative to the project root directory.

##### 2. Experiment Configuration

You can override the default data path in each experiment's configuration file:

```yaml
# In config/experiments/your_experiment.yaml
paths:
  data: /path/to/external/datasets
```

This is useful when you want to use a specific dataset location for a particular experiment.

##### 3. Command-line Arguments

You can specify the data path when running an experiment using the `--data-path` argument:

```bash
python examples/tamper_detection_experiment.py --data-path /path/to/external/datasets
```

This overrides both the default configuration and any path specified in the experiment configuration file.

#### Priority Order

When determining the data path, the framework uses the following priority order:

1. Command-line argument (`--data-path`)
2. Experiment configuration (`paths.data` in the experiment YAML file)
3. Default configuration (`paths.data` in `config/default.yaml`)

#### Dataset Structure

Regardless of where your datasets are stored, they should maintain the same directory structure:

```
<data_path>/
├── tampering_detection_font/
│   ├── original/
│   │   └── [image files]
│   └── tampered/
│       └── [image files]
└── tampering_detection_semantics/
    ├── original/
    │   └── [image files]
    └── tampered/
        └── [image files]
```

This ensures that the data handler can locate the correct files regardless of the base data path.

## Adding New Components

### Adding a New LLM

1. Create a new file in `src/llm/` that implements the `LLMInterface` class:
```python
from src.core.registry import llm_registry
from src.llm.base import LLMInterface

@llm_registry.register("MyLLM")
class MyLLM(LLMInterface):
    def __init__(self, **kwargs):
        # Initialise your LLM
        pass
        
    def generate(self, prompt_messages):
        # Implement generation logic
        pass
```

2. The LLM will be automatically registered with the `llm_registry` using the decorator.

### Adding a New Task

1. Create a new directory in `src/tasks/` for your task.
2. Implement the `TaskInterface` class:
```python
from src.core.registry import task_registry
from src.tasks.base import TaskInterface

@task_registry.register("my_task")
class MyTask(TaskInterface):
    def __init__(self, **kwargs):
        # Initialise your task
        pass
        
    def run(self, llm, prompt_optimiser=None, **kwargs):
        # Implement task logic
        pass
        
    def evaluate(self, results, **kwargs):
        # Implement evaluation logic
        pass
```

3. The task will be automatically registered with the `task_registry` using the decorator.

### Adding a New Prompt Optimiser

1. Create a new file in `src/prompt_optimisation/` that implements the `PromptoptimiserInterface` class:
```python
from src.core.registry import prompt_optimiser_registry
from src.prompt_optimisation.base import PromptoptimiserInterface

@prompt_optimiser_registry.register("my_optimiser")
class MyPromptOptimiser(PromptoptimiserInterface):
    def __init__(self, **kwargs):
        # Initialise your optimiser
        pass
        
    def optimise(self, base_prompt, feedback_function, **kwargs):
        # Implement optimisation logic
        pass
        
    def apply(self, base_prompt, **kwargs):
        # Apply optimisation to a prompt
        pass
```

2. The optimiser will be automatically registered with the `prompt_optimiser_registry` using the decorator.

## Configuration

The framework uses YAML files for configuration. Experiment-specific configurations are in `config/experiments/`. Here's an example configuration for the tamper detection task with tone optimisation:

```yaml
experiment:
  name: tamper_detection_tone_opt
  description: Tamper detection experiment with tone optimisation

llm:
  default: OpenAI
  OpenAI:
    version_name: gpt-4o_v2024-05-13
    temperature: 0.7
    top_p: 1.0

task:
  default: tamper_detection
  tamper_detection:
    font_semantics: font
    num_images: 25
    prompt_msg_template:
      - role: system
        content: >
          You are an expert in detecting tampering in images...

prompt_optimiser:
  default: tone
  tone:
    base_prompt: >
      You are an expert in detecting tampering in images...
    r_seed: 42
    optimise_user_prompt_flag: false
    optuna_n_trials: 5
```

## Experiment Results

Experiment results are saved in the `output/` directory, with a subdirectory for each experiment run (named with the experiment name and timestamp). The results include:

- `config.yaml`: The configuration used for the experiment
- `metrics.json`: Evaluation metrics from the experiment
- `experiment.log`: Detailed log of the experiment execution

## Data Flow

The typical data flow in an experiment is as follows:

1. Initialise LLM, task, and prompt optimiser components
2. Run the task with the base prompt
3. Optimise the prompt using the feedback function
4. Run the task with the optimised prompt
5. Evaluate and compare the results

![Data Flow Diagram](images/data_flow.png)

## Logging

### Using the Logging Decorator

The framework provides a convenient logging decorator that can be used to add logging to any function or method. The decorator is defined in `src/utils/decorator_utils.py` and can be used as follows:

```python
from src.utils.decorator_utils import with_logger

@with_logger
def my_function():
    # The logger is automatically available in the function scope
    logger.info("Function called")
    
    # Perform some operations
    result = perform_operation()
    
    # Log the result
    logger.info(f"Operation result: {result}")
    
    return result
```

The decorator automatically creates a logger using the function's module name and makes it available in the function's scope. This eliminates the need for boilerplate logger initialisation code in each function.

### Log Levels

The framework uses the standard Python logging levels:

- `DEBUG`: Detailed information, typically of interest only when diagnosing problems
- `INFO`: Confirmation that things are working as expected
- `WARNING`: An indication that something unexpected happened, or may happen in the future
- `ERROR`: Due to a more serious problem, the software has not been able to perform some function
- `CRITICAL`: A serious error, indicating that the program itself may be unable to continue running

## Testing

The framework includes a comprehensive test suite in the `tests/` directory:

- Unit tests for individual components
- Integration tests for end-to-end functionality

To run the tests:

```bash
pytest
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.
