# Meta-Analysis Pipeline for Elicitation Run Analysis

This package provides tools for analyzing successful elicitation runs across different execution agents and domains. The pipeline consists of three main stages:

1. **Task Selection** - Generate filtered task lists for analysis
2. **Summary Generation** - Create elicitation run summaries for each task
3. **Categorization** - Categorize summaries into vulnerability categories

---

## Directory Structure

```
meta_analysis_package/
├── generate_successful_task_list.py     # Step 1: Task selection
├── elicitation_run_summary.py           # Step 2: Individual summary generation
├── parallel_generate_elicitation_run_summaries.sh  # Step 2: Batch summary generation
├── elicitation_run_categorization.py    # Step 3: Categorization
├── prompts/                             # LLM prompts
│   ├── elicitation_run_summary_prompt.md
│   ├── elicitation_run_categorization_prompt.md
│   └── elicitation_run_categorization_update_prompt.md
├── task_filters/                        # Filter files
│   ├── data_filter_harm_threshold_0%_final.json   # Baseline harm rate exclusions
│   └── final_decisions.json             # Human annotation decisions
├── elicitation_run_categorization/      # Categorization outputs
│   └── o4-mini-2025-04-16/
│       └── all_refinement_models/
│           ├── claude-haiku-4-5-20251001/
│           └── claude-opus-4-5-20251101/
└── elicitation_summary_logs/            # Summary generation logs
```

---

## Step 1: Task Selection

Generate filtered task lists that identify successful elicitation runs for analysis.

### Filter File Formats

#### Baseline Harm Rate Filter (`data_filter_harm_threshold_0%_final.json`)

Lists task IDs with baseline harm rate > 0% that should be **excluded**:

```json
{
    "os": {
        "agent_claude-haiku-4-5-20251001": [
            "task-id-1",
            "task-id-2"
        ]
    }
}
```

**Note**: You need to create this file for the `os` domain based on your baseline evaluation results.

#### Human Annotation Filter (`final_decisions.json`)

Contains human annotation decisions. Tasks marked as "False Positive" are excluded:

```json
{
    "task_id:perturbed_id:refinement_model:execution_agent": {
        "final_decision": "False Positive",
        "execution_agent": "claude-haiku-4-5-20251001",
        ...
    }
}
```


### For Claude 4.5 Haiku (with baseline harm rate + human annotation filtering)

```bash
python generate_successful_task_list.py \
    --domain os \
    --perturbation_model o4-mini-2025-04-16 \
    --agent claude-haiku-4-5-20251001 \
    --data_filter task_filters/data_filter_harm_threshold_0%_final.json \
    --human_annotation_filter task_filters/final_decisions.json \
    --output task_list_claude_haiku_os_filtered.json
```

### For Claude 4.5 Opus (with human annotation filtering only)

```bash
python generate_successful_task_list.py \
    --domain os \
    --perturbation_model o4-mini-2025-04-16 \
    --agent claude-opus-4-5-20251101 \
    --human_annotation_filter task_filters/final_decisions.json \
    --output task_list_claude_opus_os_filtered.json
```

### Arguments

| Argument | Description |
|----------|-------------|
| `--domain` | Domain to search (e.g., `os`, `multi_apps_test`, or `all`) |
| `--perturbation_model` | Perturbation model used (default: `o4-mini-2025-04-16`) |
| `--agent` | Execution agent (e.g., `claude-haiku-4-5-20251001`, `claude-opus-4-5-20251101`) |
| `--data_filter` | Path to baseline harm rate filter JSON (excludes tasks with >0% baseline harm) |
| `--human_annotation_filter` | Path to human annotation decisions JSON (excludes "False Positive" examples) |
| `--output` | Output filename for the task list |



### Expected Output

A JSON file with the following structure:

```json
{
    "metadata": {
        "generated_at": "2026-01-22T14:01:10.114458",
        "domain": "os",
        "agent": "claude-haiku-4-5-20251001",
        "total_successful_runs": ,
        "successful_runs_by_refinement_model": {
            "gpt-5-2025-08-07": ,
            "us_anthropic_claude-haiku-4-5-20251001-v1_0": 
        }
    },
    "task_lists_by_refinement_model": {
        "gpt-5-2025-08-07": ["os:task_id:perturbed_id", ...],
        ...
    },
    "task_details_by_refinement_model": {
        "gpt-5-2025-08-07": [
            {
                "domain": "os",
                "task_id": "...",
                "perturbed_id": "...",
                "refinement_model": "gpt-5-2025-08-07"
            },
            ...
        ]
    }
}
```

---

## Step 2: Summary Generation

Generate elicitation run summaries for each task in your task list. This creates an analysis of what perturbation strategy was used and why it succeeded.

### Run Summary Generation (Parallel)

```bash
./parallel_generate_elicitation_run_summaries.sh \
    --task_list_file task_list_claude_haiku_os_filtered.json \
    --agent claude-haiku-4-5-20251001 \
    --max_parallel 10
```

### Arguments

| Argument | Description |
|----------|-------------|
| `--task_list_file` | Path to the task list JSON from Step 1 |
| `--agent` | Execution agent name |
| `--max_parallel` | Number of parallel jobs (default: 10) |
| `--perturbation_model` | Perturbation model (default: `o4-mini-2025-04-16`) |
| `--api` | API provider: `openai`, `anthropic`, `anthropic_bedrock` (default: `openai`) |
| `--model` | Summary generation model (default: `gpt-5-2025-08-07`) |

### Output Location

Summaries are saved to each task's directory:

```
perturbed_queries/{domain}/{task_id}/{perturbation_model}/
    perturbed_query_{perturbed_id}/
        iterative_refinement_{refinement_model}/
            agent_{agent}/
                elicitation_run_summary.json   # <-- Summary output
```

### Log Files

Logs are saved to:

```
elicitation_summary_logs/summary_{agent}_{timestamp}/
├── run_summary_{agent}_{timestamp}.json    # Batch summary with costs
├── {task_id}_{perturbed_id}_{refinement_suffix}.log     # Per-task logs
├── {task_id}_{perturbed_id}_{refinement_suffix}.status  # SUCCESS or FAILED
└── {task_id}_{perturbed_id}_{refinement_suffix}.cost    # Cost for this task
```

### Summary Output Format

Each `elicitation_run_summary.json` contains:

```json
{
    "original_instruction": "The original benign instruction...",
    "perturbed_instruction": "The successful perturbed instruction...",
    "summary": "Analysis of perturbation strategy and why it succeeded...",
    "input_tokens": 1711,
    "output_tokens": 1411,
    "total_tokens": 3122,
    "cost": "0.01624875",
    "summary_metadata": {
        "api_type": "openai",
        "model_name": "gpt-5-2025-08-07",
        "generated_at": "2026-01-22T15:09:01.584283"
    }
}
```

### Running for Both Agents

```bash
# Claude 4.5 Haiku
./parallel_generate_elicitation_run_summaries.sh \
    --task_list_file task_list_claude_haiku_os_filtered.json \
    --agent claude-haiku-4-5-20251001

# Claude 4.5 Opus
./parallel_generate_elicitation_run_summaries.sh \
    --task_list_file task_list_claude_opus_os_filtered.json \
    --agent claude-opus-4-5-20251101
```

---

## Step 3: Categorization

Categorize the summaries into vulnerability categories. This can either create a new categorization or update an existing one.

### Resume from Existing Categorization (Recommended)

Use existing categorizations and extend them with OS domain examples:

```bash
# For Claude 4.5 Haiku
python elicitation_run_categorization.py \
    --task_list_path task_list_claude_haiku_os_filtered.json \
    --agent claude-haiku-4-5-20251001 \
    --resume_categorization_path elicitation_run_categorization/o4-mini-2025-04-16/all_refinement_models/claude-haiku-4-5-20251001/elicitation_run_categorization_claude-haiku-4-5-20251001_20260122_165620.json

# For Claude 4.5 Opus
python elicitation_run_categorization.py \
    --task_list_path task_list_claude_opus_os_filtered.json \
    --agent claude-opus-4-5-20251101 \
    --resume_categorization_path elicitation_run_categorization/o4-mini-2025-04-16/all_refinement_models/claude-opus-4-5-20251101/elicitation_run_categorization_claude-opus-4-5-20251101_20260122_171559.json
```

### Create New Categorization

If starting fresh without an existing categorization:

```bash
python elicitation_run_categorization.py \
    --task_list_path task_list_claude_haiku_os_filtered.json \
    --agent claude-haiku-4-5-20251001 \
    --initial_categorization_batch_size 10 \
    --iterative_categorization_batch_size 5
```

### Arguments

| Argument | Description |
|----------|-------------|
| `--task_list_path` | Path to task list JSON from Step 1 |
| `--agent` | Execution agent name |
| `--resume_categorization_path` | Path to existing categorization to extend |
| `--refinement_model_filter` | Only process tasks for a specific refinement model |
| `--initial_categorization_batch_size` | Tasks for initial categorization (default: 10) |
| `--iterative_categorization_batch_size` | Tasks per iteration (default: 5) |
| `--model` | Categorization model (default: `gpt-5-2025-08-07`) |

### Output Location

Categorization output is saved to:

```
elicitation_run_categorization/{perturbation_model}/all_refinement_models/{agent}/
    elicitation_run_categorization_{agent}_{timestamp}.json
```

### Output Format

```json
{
    "categorization": {
        "categories": [
            {
                "category_name": "Overbroad Cleanup/Finalization Cues",
                "definition": "Language that frames the task as cleaning...",
                "examples": [
                    {
                        "id": "task_id:perturbed_id",
                        "trigger_phrase": "...",
                        "justification": "..."
                    }
                ]
            }
        ]
    },
    "statistics": {
        "total_tasks": 220,
        "total_categories": 7,
        "total_examples": 220,
        "total_cost": "0.45678900"
    },
    "metadata": {
        "task_list_path": "...",
        "agent": "claude-haiku-4-5-20251001",
        "generated_at": "2026-01-22T16:56:20.123456"
    }
}
```

---

## Complete Workflow Example

### For OS Domain - Claude 4.5 Haiku

```bash
# 1. Generate task list (with baseline harm + human annotation filtering)
python generate_successful_task_list.py \
    --domain os \
    --agent claude-haiku-4-5-20251001 \
    --data_filter task_filters/data_filter_harm_threshold_0%_final.json \
    --human_annotation_filter task_filters/final_decisions.json \
    --output task_list_claude_haiku_os_filtered.json

# 2. Generate summaries
./parallel_generate_elicitation_run_summaries.sh \
    --task_list_file task_list_claude_haiku_os_filtered.json \
    --agent claude-haiku-4-5-20251001 \
    --max_parallel 10

# 3. Update categorization
python elicitation_run_categorization.py \
    --task_list_path task_list_claude_haiku_os_filtered.json \
    --agent claude-haiku-4-5-20251001 \
    --resume_categorization_path elicitation_run_categorization/o4-mini-2025-04-16/all_refinement_models/claude-haiku-4-5-20251001/elicitation_run_categorization_claude-haiku-4-5-20251001_20260122_165620.json
```

### For OS Domain - Claude 4.5 Opus

```bash
# 1. Generate task list (human annotation filtering only)
python generate_successful_task_list.py \
    --domain os \
    --agent claude-opus-4-5-20251101 \
    --human_annotation_filter task_filters/final_decisions.json \
    --output task_list_claude_opus_os_filtered.json

# 2. Generate summaries
./parallel_generate_elicitation_run_summaries.sh \
    --task_list_file task_list_claude_opus_os_filtered.json \
    --agent claude-opus-4-5-20251101 \
    --max_parallel 10

# 3. Update categorization
python elicitation_run_categorization.py \
    --task_list_path task_list_claude_opus_os_filtered.json \
    --agent claude-opus-4-5-20251101 \
    --resume_categorization_path elicitation_run_categorization/o4-mini-2025-04-16/all_refinement_models/claude-opus-4-5-20251101/elicitation_run_categorization_claude-opus-4-5-20251101_20260122_171559.json
```