# CycleIE: Robust Document Information Extraction through Iterative Verification and Refinement

A document question-answering system using specialized agents with a cyclic workflow for complex analytical tasks.

## Overview

CycleIE decomposes complex document-based questions into manageable unit questions, processed through a multi-agent framework with ReAct orchestration and MCTS-based action selection.

## Agents

CycleIE employs six specialized agents that communicate via structured signals:

- **Retriever**: Identifies and retrieves relevant document segments; refines retrieval based on verification signals.
- **Structurer**: Determines optimal structural representation (e.g., table, graph, tree) for the extracted information.
- **Extractor**: Converts retrieved segments into structured data according to the format selected by the Structurer.
- **Verifier**: Evaluates structured data for completeness, relevance, and accuracy, generating explicit signals for refinement.
- **Refiner**: Adjusts queries based on verification signals to improve extraction quality.
- **Reasoner**: Utilizes verified structured data to derive clear and interpretable answers.

## Architecture

The system employs:
- **ReAct Orchestration**: Coordinates agent actions and communication via structured signals
- **MCTS-based Action Selection**: Optimizes action sequences using Monte Carlo Tree Search
- **Explicit Signal Communication**: Enables iterative refinement through informative verification feedback

```
CycleIE/
├── main.py                # Entry point
├── agents/                # Agent implementations
├── core/                  # Core components
└── utils/                 # Utility functions
```

## Usage

```python
from CycleIE.main import process_query

# Process a query
result = process_query(
    query="What are the key points in this document?",
    document=["path/to/document.pdf"],
    model="llm-model-name"
)

# Get answer
print(result.get("answer"))
```

## Features

- Modular agent-based architecture
- Document retrieval with semantic search
- Structured information extraction
- Cyclic reasoning workflow
- Real-time thought process streaming 