# React* GUI Agent

Android automation Agent based on ReAct (Reasoning + Acting) paradigm

---

## Core Features

- **Planner + Summarizer loop** - Observe → Reason → Execute → Summarize
- **Multi-round reflection** - Automatic reflection and retry after failure
- **Standalone** - Independent of AutoRPA, usable in any project
- **Standardized output** - Outputs StandardTrajectory (three-layer structure)

---

## Quick Start

### Method 1: Via main.py

```bash
# Basic exploration
python main.py \
    --agent_name=gui-agent \
    --gui_agent_type=react_star \
    --tasks=ContactsAddContact

# Configure Planner and Summarizer models
python main.py \
    --agent_name=gui-agent \
    --gui_agent_type=react_star \
    --planner_llm=claude-sonnet-4-5 \
    --summarizer_llm=gpt-5-low \
    --reflection_rounds=2 \
    --tasks=ContactsAddContact
```

### Method 2: Use in code

```python
from gui_agents.react_star import ReactStarAgent
from autorpa.utils.llm_client import get_llm_wrapper

# Create LLM wrappers
planner_llm = get_llm_wrapper("claude-sonnet-4-5")
summarizer_llm = get_llm_wrapper("gpt-5-low")

# Create Agent
agent = ReactStarAgent(
    agent_rpa=None,  # Not needed when used standalone
    planner_llm=planner_llm,
    summarizer_llm=summarizer_llm,
    max_reflection_rounds=2
)

# Explore task
trajectory = await agent.explore_task(
    task=task,
    env_interface=env_op,
    max_steps=20
)
```

### Method 3: Via Agent Registry

```python
from gui_agents import agent_registry

agent = agent_registry.create(
    'react_star',
    planner_llm=planner_llm,
    summarizer_llm=summarizer_llm,
    max_reflection_rounds=2
)
```

---

## Configuration Parameters

### Core parameters

```bash
--reflection_rounds=2              # Reflection rounds (1-3 recommended)
--planner_llm=claude-sonnet-4-5    # Planner model
--summarizer_llm=gpt-5-low         # Summarizer model
```

### Action Space configuration

```bash
--react_star_action_space=index    # 'index' (element index) or 'coordinate'
--react_star_ui_info=screenshot_with_tree  # UI info mode
    # - screenshot_with_tree: screenshot + UI tree + SoM markers
    # - screenshot_only: screenshot only
    # - screenshot_only_som: screenshot + SoM markers
--react_star_img_resize_mode=resized  # 'resized' (461x1024) or 'original'
```

### Shell Action support

```bash
--enable_shell_action   # Enable shell commands (use with caution)
```

---

## Workflow

```
Initialize
  ↓
┌─────────────────────────┐
│  ReAct Loop (each step)  │
│  ┌──────────────────┐   │
│  │ 1. Planner       │   │  Observe screen
│  │    - Observe     │   │  Analyze UI elements
│  │    - Reason      │   │  Generate plan
│  │    - Gen action  │   │  Output code
│  └──────────────────┘   │
│          ↓              │
│  ┌──────────────────┐   │
│  │ 2. Executor      │   │  Execute action
│  │    - Exec code   │   │  Get feedback
│  └──────────────────┘   │
│          ↓              │
│  ┌──────────────────┐   │
│  │ 3. Summarizer    │   │  Compare before/after
│  │    - Compare     │   │  Summarize changes
│  │    - Summarize   │   │  Update history
│  └──────────────────┘   │
│          ↓              │
│   Success? Yes → Done   │
│       ↓ No             │
│   Continue loop        │
└─────────────────────────┘
  ↓
Failure? → Concluder → Reflection → Restart
  ↓
Success → Output StandardTrajectory
```

---

## Output Format

### StandardTrajectory (three-layer structure)

```python
{
  "task_type": "ContactsAddContact",
  "task_goal": "Add a contact with name 'John' and phone '1234567890'",
  "success": true,
  "agent_name": "react_star",
  "duration": 45.2,
  
  # Layer 1: Full execution details
  "raw_steps": [
    {
      "step_num": 1,
      "before_obs": {...},      # Screenshot, UI tree
      "planner_output": {...},  # Observation, reasoning, plan
      "action_code": "env_op.open_app('Contacts')",
      "exec_result": {...},     # Execution feedback
      "after_obs": {...},
      "summarizer_output": {...}  # Change summary
    },
    ...
  ],
  
  # Layer 2: Concise action sequence (for RPA generation)
  "action_history": {
    "completed_tasks": ["Open Contacts app", "Click add button"],
    "actions": [
      "env_op.open_app('Contacts')",
      "env_op.click(5)",
      ...
    ]
  },
  
  # Layer 3: High-level reflection (generated on failure)
  "reflection_data": {
    "round": 1,
    "reflection": "Should check for existing contact with same name first...",
    "conclusion": "..."
  }
}
```

---

## Directory Structure

```
react_star/
├── adapter.py              # ReactStarAgent main implementation
├── factory.py              # Agent factory (auto-registration)
│
├── core/                   # Standalone core modules
│   ├── models.py           # Data models
│   ├── utils.py            # Utility functions
│   └── environment_adapter.py  # Environment adapter
│
├── execution/              # Action execution
│   ├── action_execution.py
│   ├── action_models.py
│   └── adb_utils.py
│
└── prompts/                # Prompt templates
    ├── planner_prompt.py
    └── summarizer_prompt.py
```

---

## Integration with AutoRPA

When used as AutoRPA's GUI Agent:

```bash
python main.py \
    --agent_name=autorpa \
    --gui_agent_type=react_star \
    --num_tasks_to_explore=3 \
    --tasks=ContactsAddContact
```

**Pipeline**:
1. React* Agent explores task → StandardTrajectory
2. ActionTranslator translates hard-coded actions (batch)
3. RPA Builder generates RPA code
4. N-to-1 verification → RPA Bank

---

## Best Practices

### Planner model selection
- **Recommended**: `claude-sonnet-4-5` - Strong reasoning, suitable for complex tasks
- **Alternative**: `gpt-5-low` - Cost-effective, suitable for simple tasks

### Summarizer model selection
- **Recommended**: `gpt-5-low` - Sufficient summarization, lower cost
- **Alternative**: `gpt-4o` - More detailed summaries

### Reflection rounds
- **0 rounds**: Quick testing, no retry
- **1-2 rounds**: Recommended, balance success rate and cost
- **3+ rounds**: Complex tasks, may incur higher cost

### Action Space mode
- **index mode** (recommended): UI tree index-based, more stable
- **coordinate mode**: Coordinate-based click, for non-indexable scenarios

---

## Debugging Tips

### View detailed logs

```bash
python main.py \
    --agent_name=gui-agent \
    --gui_agent_type=react_star \
    --log_level=DEBUG \
    --enable_llm_logging=True \
    --tasks=ContactsAddContact
```

**Output location**:
```
log/gui_agent_YYYY-MM-DD_HH-MM-SS/
├── TaskType/
│   └── task_0/
│       ├── round_0/                    # First attempt
│       │   ├── step-1_before_planner_prompt.txt
│       │   ├── step-1_before_planner_output.txt
│       │   ├── step-1_after_summarizer_prompt.txt
│       │   ├── step-1_after_summarizer_output.txt
│       │   └── step-1_*.png
│       ├── round_1/                    # Retry after reflection
│       └── standard_trajectory.json
└── output.log                          # Full log
```

### FAQ

**Planner keeps failing**
- Check if UI info mode is appropriate
- Try stronger model (e.g., claude-sonnet-4-5)
- Increase reflection rounds

**Action execution errors**
- Check action_space mode
- View execution logs in `action_execution.py`
- Verify UI elements exist

---

## Design Principles

1. **Independence** - No framework lock-in, usable in any project
2. **Modularity** - Clear module boundaries, easy to understand and modify
3. **Reusability** - Integrates with other systems via standard interfaces
4. **Extensibility** - Easy to add new prompts, tools, or strategies

---

**Related docs**:
- [Main project README](../../README.md)
- [AutoRPA full guide](../../GUIDE.md)
