# Configuration Schema for Prompt Optimization Framework

See bottom of document for example config.

#### 1. Project Information
```yaml
project:
  name: string              # Project name (default: "prompt_optimisation")
  version: string           # Version identifier (default: "0.1.0")
  description: string       # Project description
```

#### 2. Path Configuration
```yaml
paths:
  data: string             # Path to data directory/file
  output: string           # Path to output directory
  [custom_key]: string     # Additional custom paths
```

#### 3. Logging Configuration
```yaml
logging:
  level: string            # Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL
  format: string           # Log message format string
  file_logging: boolean    # Enable/disable file logging
  log_dir: string         # Directory for log files
```

## Component Configuration

The framework supports both legacy and modern component configuration styles:

### Components Structure
```yaml
components:
  llm:
    default: string                    # Name of default LLM component
    [component_name]:                 # Custom component configurations
      version_name: string            # Model version identifier
      temperature: number (0.0-2.0)   # Sampling temperature
      top_p: number (0.0-1.0)         # Nucleus sampling parameter
      [custom_param]: any             # Component-specific parameters
  
  task:
    default: string                    # Name of default task component
    [component_name]:                 # Custom task configurations
      split: string                   # Dataset split (train/test/validation)
      num_instances: integer          # Number of instances to process
      data_path: string              # Path to task data
      prompt_msg_template: array     # Message template structure
      [custom_param]: any            # Task-specific parameters
  
  prompt_optimiser:
    default: string                    # Name of default optimizer
    [component_name]:                 # Custom optimizer configurations
      base_prompt: string            # Base prompt text
      r_seed: integer               # Random seed
      optimise_user_prompt_flag: boolean
      optuna_n_trials: integer      # Number of optimization trials
      optuna_db_name: string        # Optuna database filename
      optuna_study_name: string     # Optuna study identifier
      [custom_param]: any           # Optimizer-specific parameters
  
  evaluator:
    default: string                   # Name of default evaluator
    [component_name]:                # Custom evaluator configurations
      [custom_param]: any            # Evaluator-specific parameters
  
  embeddings:
    default: string                   # Name of default embeddings model
    enabled: boolean                 # Enable/disable embeddings analysis
    [component_name]:                # Custom embeddings configurations
      model: string                  # Embeddings model identifier
      output_format: array          # Output formats ["png", "pdf", "csv"]
      [custom_param]: any           # Embeddings-specific parameters
```

### Legacy Component Structure (Backward Compatibility)
```yaml
llm:
  default: string
  [component_name]: object

task:
  default: string
  [component_name]: object

prompt_optimiser:
  default: string
  [component_name]: object
```

## Message Template Structure

Used in task configurations for defining prompt templates:

```yaml
prompt_msg_template:
  - role: "system" | "user" | "assistant"
    content: string | object        # Text content or structured content
  - role: "user"
    content:
      - type: "text"
        text: string               # Text content with {variable} placeholders
      - type: "image_url"
        image_url:
          url: string             # Image URL (empty for dynamic insertion)
```

**Role Types:**
- `system` - System-level instructions for the AI model
- `user` - User input or questions
- `assistant` - AI assistant responses (for few-shot examples)

## Step-Based Configuration

### Steps Array
```yaml
steps:
  - name: string                    # Unique step identifier (required)
    type: "component" | "function" | "custom"  # Step type (required)
    depends_on: array              # List of step names this depends on
    optional: boolean              # Whether step failure stops experiment (default: false)
    config: object                 # Additional step configuration
    
    # For component steps (type: "component"):
    component_type: "llm" | "task" | "prompt_optimiser" | "evaluator" | "embeddings"
    component_name: string         # Name with variable support: "${components.llm.default}"
    
    # For function steps (type: "function"):
    function: "optimise_prompt" | "run_generation" | "evaluate_results" | "analyse_results"
    
    # For custom steps (type: "custom"):
    module: string                 # Python module path
    class: string                  # Class name to instantiate
```

### Step Types

#### 1. Component Steps (`type: "component"`)
- Instantiate framework components using the registry system
- **Required fields:** `component_type`, `component_name`
- **Component types:** `llm`, `task`, `prompt_optimiser`, `evaluator`, `embeddings`
- **Variable interpolation:** Component names support `${path.to.value}` syntax

#### 2. Function Steps (`type: "function"`)
- Execute built-in framework functions
- **Required fields:** `function`
- **Available functions:**
  - `optimise_prompt` - Performs prompt optimization
  - `run_generation` - Executes generation with prompts

### Built-in Step Functions

1. **`optimise_prompt`**
   - **Purpose:** Performs prompt optimization using configured optimizer
   - **Dependencies:** Requires `llm`, `task`, and `prompt_optimiser` components
   - **Output:** Optimized prompt optimizer instance

2. **`run_generation`**
   - **Purpose:** Runs generation with base or optimized prompts
   - **Dependencies:** Requires `llm` and `task` components; optional `optimise_prompt` step
   - **Output:** Results dataframe and evaluation score

## Variable Interpolation

The configuration supports variable interpolation using `${path.to.value}` syntax:

```yaml
components:
  llm:
    default: OpenAI
    
steps:
  - name: "init_llm"
    component_name: "${components.llm.default}"  # Resolves to "OpenAI"
```



## Example Configurations

### Complete Step-Based Configuration
```yaml
experiment:
  name: gsm8k_with_psao
  description: GSM8K experiment with PSAO optimization

paths:
  data: &data_path /path/to/gsm_data

components:
  llm:
    default: OpenAI
    OpenAI:
      version_name: gpt-4o_v2024-05-13
      temperature: 0.5
      top_p: 0.95

  task:
    default: gsm8k
    gsm8k:
      split: test
      num_instances: 60
      data_path: *data_path
      prompt_msg_template:
        - role: system
          content: "Take a deep breath and work on this problem step-by-step."
        - role: user
          content: "{question}"

  prompt_optimiser:
    default: psao
    psao:
      psao_intro_prompt: ""
      psao_struct_ann: "(importance ann_var)"
      r_seed: 42
      optimise_user_prompt_flag: false
      optuna_db_name: prompt_opt_gsm8k_psao_db.db
      optuna_study_name: gsm8k_psao
      optuna_n_trials: 20

steps:
  - name: "init_llm"
    type: "component"
    component_type: "llm"
    component_name: "${components.llm.default}"
  
  - name: "init_task"
    type: "component"
    component_type: "task"
    component_name: "${components.task.default}"
  
  - name: "init_prompt_optimiser"
    type: "component"
    component_type: "prompt_optimiser"
    component_name: "${components.prompt_optimiser.default}"
  
  - name: "optimise_prompt"
    type: "function"
    function: "optimise_prompt"
    depends_on: ["init_llm", "init_task", "init_prompt_optimiser"]
  
  - name: "run_generation"
    type: "function"
    function: "run_generation"
    depends_on: ["init_llm", "init_task", "optimise_prompt"]
  
  - name: "evaluate_results"
    type: "function"
    function: "evaluate_results"
    depends_on: ["run_generation"]

output:
  results_dataframe: output/gsm8k_psao_results.csv
```
