# LLM Agent Framework

A configuration-driven framework for building and optimizing LLM agents that execute structured inference tasks. The framework serves two complementary purposes:

1. **Structured LLM Agent Definition**: Provides a configuration-driven framework for implementing LLM agents that execute single inference calls with structured input processing, prompt templating, and output parsing.

2. **Data-Driven Agent Improvement**: Enables systematic agent enhancement through automated workflows where LLM-based improver agents can modify existing task agents by updating their configurations, prompts, and input processing logic based on performance data.

## Table of Contents

1. **Structured LLM Agent Definition**
   - Configuration File (JSON)
   - Agent Class Implementation (Python)
   - Input Schema File (JSON Schema)
   - Configuration Validation
   - Common Issues

2. **Data-Driven Agent Improvement**
   - Agent Improvement Capabilities
   - Workflow Integration

**Agent Design Principles:**

1. **Single Inference Focus**: Each LLM agent represents a single LLM inference call with complex input processing, prompt templating, and output parsing. Multi-step reasoning requiring multiple LLM interactions must be orchestrated at the workflow level using separate agents.

2. **Computational Agents**: The framework also supports "computational" agents that perform data transformations without LLM inference calls, useful for preprocessing, postprocessing, or pure computational tasks within workflows.

## 1. Structured LLM Agent Definition

Every LLM agent consists of **three required components**:

### Configuration File (JSON)
Defines the agent's behavior, model settings, input processing, prompt structure, and output format. When executed, agents process the input data through defined transformations, resolve placeholders in the prompt template with the processed values, send the final prompt to the specified LLM model, and parse the response according to the output schema.

### Agent Class Implementation (Python)
Extends a base `LLMAgent` class and implements the custom transformation methods referenced in the configuration's computed inputs. Only methods that are called by `"function"` fields in the config's inputs section are used and other methods won't be executed.

### Input Schema File (JSON Schema)
Defines the structure of the input data that the agent has access to.

**Important**: The input schema should only reference inputs with `"source": "data"` or `"source": "context"`. Computed inputs (those with `"source": "computed"`) are generated by the agent's methods and should NOT be included in the input schema since they don't come from external data.

### Configuration Sections

#### Description
**Required field.** Must be a non-empty string describing the agent's purpose.

**Example:**
```json
"description": "Human-readable description of what the agent does"
```

#### Input Schema
**Required field.** Must be a valid file path to a JSON Schema file that defines the structure of input data the agent expects.

**Example:**
```json
"input_schema": "path/to/input_schema.json"
```

#### Model
**Required for LLM agents, omit entirely for computational agents.**

- **name**: Must be a valid AWS Bedrock model identifier (e.g., "anthropic.claude-3-5-sonnet-20241022-v2:0")
- **parameters**: Model-specific parameters with valid ranges:
  - **temperature**: Float between 0.0 and 1.0
  - **max_tokens**: Integer between 1 and model's maximum limit
- **_modifiable**: Boolean indicating whether other LLM agents can modify these settings

**Example:**
```json
"model": {
  "_modifiable": true,
  "name": "anthropic.claude-3-5-sonnet-20241022-v2:0",
  "parameters": {
    "temperature": 0.7,
    "max_tokens": 4096
  }
}
```

#### Inputs
**Required field.** Must contain at least one input definition.

**Two input types:**

**Data inputs** (`"source": "data"`):
- **path**: Valid JSONPath expression ("$" for entire input, "field_name" for specific field, "$.nested.field" for nested access)
- **required**: Boolean (default: true)

**Computed inputs** (`"source": "computed"`):
- **function**: Must match an existing method name in the agent class
- **args**: Object mapping argument names to other input names (all referenced inputs must be defined)

**Rules:**
- Input names must be valid identifiers (alphanumeric + underscore)
- Computed input functions must exist in the agent class
- Argument references in computed inputs must point to existing inputs
- No circular dependencies between inputs

**Example:**
```json
"inputs": {
  "_modifiable": true,
  "problem_text": {
    "source": "computed",
    "function": "extract_problem_text",
    "args": {"problem_data": "raw_problem_data"}
  },
  "formatted_context": {
    "source": "computed",
    "function": "format_with_context",
    "args": {"text": "problem_text", "context": "raw_problem_data"}
  },
  "raw_problem_data": {
    "source": "data",
    "path": "$",
    "required": true
  }
}
```

#### Prompt
**Required for LLM agents, set to `null` for computational agents.** 
The template becomes the actual prompt text sent to the LLM after placeholder resolution.

- **sections**: Object containing reusable prompt components (section names must be valid identifiers)
- **template**: String containing the final prompt with placeholders

**Placeholder validation rules:**
- `{inputs.name}`: Must reference a defined input name
- `{prompt.sections.name}`: Must reference a defined section name
- All placeholders in template must resolve to existing inputs or sections
- Placeholder syntax: `{category.name}` where category is "inputs" or "prompt.sections"

**Example:**
```json
"prompt": {
  "_modifiable": true,
  "sections": {
    "instructions": "You are a math problem solver",
    "output_format": "Respond with JSON: {\"answer\": \"123\"}"
  },
  "template": "### Instructions:\n{prompt.sections.instructions}\n\n### Problem:\n{inputs.problem_text}\n\n### Format:\n{prompt.sections.output_format}"
}
```

#### Output
**Required field.** Defines how the LLM response is parsed into structured output. This section typically has `"_modifiable": false` because changing the output schema can break agent chaining in workflows.

**For LLM agents:**
- **format**: Must be "json" or "text"
- **schema**: Valid JSON Schema object (required when format is "json")
- **error_values**: Object with fallback values for each schema property when parsing fails

**For computational agents:** Must include `field_mappings` instead of format/schema:
```json
"field_mappings": {
  "result": "inputs.processed_data"
}
```
- Keys are output field names, values must reference existing inputs using "inputs.name" syntax

**Critical:** The prompt template must explicitly instruct the LLM to produce output matching the exact format and schema defined here.

**Example:**
```json
"output": {
  "_modifiable": false,
  "format": "json",
  "schema": {
    "type": "object",
    "required": ["reasoning", "answer"],
    "properties": {
      "reasoning": {"type": "string"},
      "answer": {"type": "string", "pattern": "^[0-9]{3}$"}
    }
  },
  "error_values": {
    "reasoning": "PARSE_ERROR",
    "answer": "PARSE_ERROR"
  }
}
```

#### Settings
**Required field.**

- **class_name**: Must exactly match the Python class name in the agent implementation file
- **timeout_seconds**: Positive integer for API timeout (omit for computational agents)
- **exit_on_parse_failure**: Boolean - whether to exit on parsing errors or continue with error values
- **_modifiable**: Boolean indicating whether other LLM agents can modify these settings

**Example:**
```json
"settings": {
  "_modifiable": false,
  "class_name": "MyAgent",
  "timeout_seconds": 300,
  "exit_on_parse_failure": false
}
```

### Agent Class Implementation (Python)

Extends `LLMAgent` and implements the custom transformation methods referenced in the configuration's computed inputs. Your agent class must:

1. **Extend LLMAgent**
```python
from llm_agent import LLMAgent

class MyAgent(LLMAgent):
    def __init__(self, name="MyAgent", config_path=None, dry_run=False):
        super().__init__(name, config_path, dry_run)
```

2. **Implement computed input functions**

**Example:**
```python
def extract_problem_text(self, problem_data):
    """Function referenced in config's computed inputs"""
    return problem_data.get('problem', '')

def format_with_context(self, text, context):
    """Function with multiple arguments from config example"""
    return f"Context: {context.get('id', 'N/A')}\nProblem: {text}"
```

3. **Match the class_name in settings**
The `settings.class_name` must exactly match your Python class name.

### Agent Execution and Validation

The framework orchestrates agent execution through a validated pipeline that integrates the three core files. When an agent is instantiated, a ConfigValidator performs static validation of the JSON configuration, checking for structural integrity (required fields like description, input_schema, inputs, output), valid model names, and ensuring all prompt template placeholders reference existing inputs or sections. This early validation catches configuration errors before any processing begins, preventing runtime failures due to malformed configurations.

During runtime, the RuntimeValidator takes over to validate actual input data against the loaded JSON schema file, ensuring the incoming data matches the expected structure and types. The validator then extracts data inputs, applies computed input transformations by calling the corresponding methods from the Python agent class, and renders the final prompt by resolving all placeholders with actual values.

Finally, the execution pipeline includes error handling and parsing validation at multiple levels. After the LLM generates a response, the ResponseParser validates the output against the defined JSON schema, automatically handling common LLM output variations like type mismatches or extra text around valid JSON. If parsing fails, the framework can either exit immediately or continue with predefined error values from the configuration's error_values section. The framework also includes retry logic for API timeouts (up to 3 attempts with 5-second delays), metadata tracking for cost calculation and token counting, and logging of placeholder values used during prompt rendering.

### Configuration Validation

The framework includes a **ConfigValidator** that performs comprehensive static validation of agent configurations before execution. This catches configuration errors early and ensures agents are properly structured.

**Validation Types:**
- **Structure validation**: Checks required fields (description, input_schema, inputs, output), validates field types, and ensures proper nesting
- **Reference validation**: Verifies all prompt template placeholders (`{inputs.name}`, `{prompt.sections.name}`) reference defined inputs and sections
- **Model validation**: Confirms model names are supported AWS Bedrock models and parameters are within valid ranges
- **Input/output consistency**: Ensures computed inputs reference existing functions, data inputs have valid JSONPath expressions, and output schemas match expected formats
- **Computational agent validation**: For non-LLM agents, validates field_mappings correctly map inputs to outputs and schema consistency

The validator also includes **unused method detection** using static analysis to identify agent class methods that aren't referenced in the configuration, helping maintain clean implementations.

### Common Issues

### Output Format Alignment

**The prompt template must explicitly instruct the LLM to respond in the expected format** (JSON in this case), and this must match the `output.format` and `output.schema` configuration. The framework parses the LLM response according to the output configuration, so the prompt must guide the LLM to produce output in that exact format.

**Example of correct alignment:**
```json
"prompt": {
  "template": "### Problem:\n{inputs.problem_text}\n\n### Output Format:\nRespond with JSON: {\"reasoning\": \"...\", \"answer\": \"XXX\"}"
},
"output": {
  "format": "json",
  "schema": {
    "type": "object",
    "required": ["reasoning", "answer"],
    "properties": {
      "answer": {"type": "string", "pattern": "^[0-9]{3}$"}
    }
  }
}
```

The framework includes robust parsing to handle common LLM output variations such as type mismatches, format inconsistencies, and extra text around valid JSON responses. This reduces the need for perfect prompt engineering while maintaining schema compliance.

If the prompt doesn't instruct the LLM to output JSON, but the configuration expects JSON format, parsing will fail with **PARSE errors** and the framework will use the error values specified in the configuration.

### Undefined Prompt References

**All inputs and prompt sections referenced in the prompt template must be defined.** The framework validates that every placeholder in the template (`{inputs.name}`, `{prompt.sections.name}`) corresponds to an actual input or prompt section.

**Common validation errors:**
- `Prompt template references undefined input 'field_name'` - Add the input to the `inputs` section
- `Prompt template references undefined section 'section_name'` - Add the section to `prompt.sections`

**Example of correct prompt structure:**
```json
"inputs": {
  "problem_text": {"source": "data", "path": "problem"}
},
"prompt": {
  "sections": {
    "instructions": "Solve this math problem",
    "format_guide": "Show your work step by step"
  },
  "template": "{prompt.sections.instructions}\n\nProblem: {inputs.problem_text}\n\n{prompt.sections.format_guide}"
}
```

### Examples

#### In-Context Learning (ICL) Agent

Demonstrates how to implement in-context learning by dynamically loading examples from external datasets.

**Input Schema Fragment:**
```json
{
  "type": "object",
  "required": ["problem", "icl_examples"],
  "properties": {
    "problem": {"type": "string"},
    "icl_examples": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "problem": {"type": "string"},
          "solution": {"type": "string"},
          "answer": {"type": "string"}
        }
      }
    }
  }
}
```

**Key Configuration Features:
```json
{
  "inputs": {
    "icl_examples": {
      "source": "computed",
      "function": "generate_icl_examples",
      "args": {
        "icl_data": "raw_icl_data",
        "num_examples": 5
      }
    },
    "raw_icl_data": {
      "source": "data",
      "path": "icl_examples",
      "required": true
    }
  },
  "prompt": {
    "template": "### Examples:\n{inputs.icl_examples}\n\n### Problem:\n{inputs.problem_text}"
  }
}
```

**Key Agent Implementation:**
```python
class ICLV0MathAgent(LLMAgent):
    def __init__(self, name="ICLV0MathAgent", config_path=None, dry_run=False):
        super().__init__(name, config_path, dry_run)
        random.seed(42)  # Reproducible example selection
    
    def generate_icl_examples(self, icl_data: list, num_examples: int = 5) -> str:
        # Randomly sample examples from provided ICL data
        selected_examples = random.sample(icl_data, min(num_examples, len(icl_data)))
        
        # Format examples for prompt
        formatted_examples = []
        for i, example in enumerate(selected_examples, 1):
            problem = example.get('problem', '').strip()
            solution = example.get('solution', '').strip()
            answer = example.get('answer', '').strip()
            formatted_examples.append(
                f"Example {i}:\nProblem: {problem}\nReasoning: {solution}\nAnswer: {answer}\n"
            )
        
        return "\n".join(formatted_examples)
```

This approach allows agents to leverage few-shot learning by including relevant examples in the prompt, improving performance on similar tasks.

## 2. Data-Driven Agent Improvement

### Agent Improvement Capabilities

The framework enables systematic agent enhancement through automated workflows where LLM-based improver agents can modify existing task agents based on performance data. Improver agents can update configurations, prompts, and input processing logic to optimize performance.

### Workflow Integration

Agent improvement is typically orchestrated through workflow systems that coordinate multiple agents in improvement loops, enabling iterative refinement of agent performance over time.

## Agent Types

### LLM Agents
- Use language models for inference
- Require `model` and `prompt` sections
- Generate responses via API calls

### Computational Agents
- Pure computation without LLM inference
- Set `"prompt": null` in configuration
- Use `field_mappings` to map inputs directly to outputs
- Omit `model` section and `timeout_seconds`

## Agent Improvement and Modification

**The framework enables LLM agents to systematically improve other LLM agents** through data-driven optimization. Improver agents analyze performance data and generate enhanced configurations and implementations.

### Improvement Capabilities

**Only configuration sections marked with `"_modifiable": true` can be modified by improver functions.** Sections marked `"_modifiable": false` or without this field must remain unchanged to maintain agent compatibility and framework constraints.

An **improver function**, also called an **improver agent**, is an LLM-based function that takes as input an existing agent (configuration + implementation) along with evaluation metrics or performance data, and generates a new agent that solves the same task but with improved performance.

**What improver functions CAN do:**
- **Modifying prompts** (when `"_modifiable": true`) - add chain-of-thought reasoning, improve instructions, add examples, extend reasoning chains for more detailed analysis (as longer reasoning increases test time computation and may give better performance)
- **Selecting and transforming inputs** (when `"_modifiable": true`) - choose which inputs from the schema to use in prompt templates, add new computed input functions to transform available data, and write custom preprocessing methods that process input data or extract examples from datasets available in the input schema
- **Adjusting model selection and parameters** (when `"_modifiable": true`) - change to different available models, adjust temperature, max_tokens, etc.

**What improver functions CANNOT or should NOT do:**
- **Access inputs not in the input schema** - they cannot invent new input fields because the input schema defines the contract between the agent and its callers, and changing it would break existing workflows and data pipelines. Improver agents work within the constraint of available data.
- **Add post-processing logic or validation functions** - agents represent a single LLM inference call with preprocessing only. This architectural constraint ensures predictable behavior, clear cost modeling, and composability in larger workflows. Any processing must happen before the LLM call, not after.
- **Modify output schemas** - the output section is typically marked `"_modifiable": false` and should never be changed because it defines the contract for downstream consumers. Changing output schemas breaks agent chaining in workflows and violates the framework's composability guarantees.
- **Modify non-modifiable sections** - sections marked `"_modifiable": false` must remain unchanged to maintain compatibility and prevent breaking agent chaining in workflows.

