# CE-Graph (Failure-Driven Workflow Refinement) — Minimal Example

This folder provides a **minimal, offline-safe** reference implementation of
the CE-Graph loop inside this repo.

It is designed to be **anonymous** (no external links, no API keys) and to serve
as a **code artifact** accompanying an ICML submission.

## What you get

1. A workflow DAG abstraction (`verl/ce_graph/workflow.py`)
2. A counterexample pool (`verl/ce_graph/counterexamples.py`)
3. Failure signatures + clustering (`verl/ce_graph/failures.py`)
4. Operator-constrained graph edits (`verl/ce_graph/operators.py`)
5. The refinement loop (`verl/ce_graph/refine.py`)

## How to plug into your real tasks

You only need to implement **one function**:

```python
def runner(workflow) -> tuple[float, list[ExecutionTrace]]:
    ...
```

It should:
1. execute `workflow` on a fixed evaluation set
2. return a **higher-is-better** score (e.g., exact match / pass@1 / success rate)
3. return `ExecutionTrace` objects for each instance (success or failure)

The CE-Graph code will:
- keep failures in the pool
- cluster failures into modes
- propose safe edits (operators)
- re-evaluate and accept the best improving candidate

## Run the toy demo

```bash
python examples/ce_graph_refine/toy_demo.py
```

This demo does **not** call any external model. It only shows that the loop,
clustering, and operators are wired correctly.
