# Automated Agent Trajectory Rollout System - Task Generation

This directory contains the task generation component of the automated agent trajectory rollout system.

## Overview

The task generation system processes URLs from different categories and generates realistic, solvable tasks for web agents. It includes:

1. **URL Simplification**: Converts detailed URLs to parent pages (e.g., `https://www.bbc.com/news/topics/c4gmdg9ne38t` → `https://www.bbc.com/news`)
2. **Content Analysis**: Uses PageParserTool to extract page content
3. **Visual Analysis**: Captures screenshots using Playwright browser automation
4. **Two-Step Task Generation**: Uses VLM in two phases - first to extract information, then to generate tasks
5. **Structured Output**: Generates 5 diverse tasks per URL in line format
6. **Category Selection**: Process specific categories or run multiple categories in parallel

## Files

- `generate_tasks.py`: Main task generation script with category selection
- `expand_urls.py`: Script for exploring the web to collect an expanded dictionary of URLs using a search browser and the original seed task set
- `refine_tasks.py`: Script for refining and rewriting generated tasks into tasks of higher quality and relevance to agent use
- `categories_dict.json`: Mind2Web training dataset, the seed task set
- `README.md`: This documentation file

## Requirements

- Python 3.8+
- GUI-Agent dependencies (see main project requirements)
- Playwright for browser automation
- VLM model access

## Usage

### Basic Usage

```bash
# Run task generation for all categories (3 URLs per category)
cd /memory_evolution
python generate_tasks.py

# Process specific categories
python generate_tasks.py --categories news shopping

# Process single category
python generate_tasks.py --categories news

# Custom number of URLs per category
python generate_tasks.py --categories news shopping --max-urls-per-category 5
```

## Configuration

The system uses the following configuration:

- **Input**: `categories_dict_expand.json`
- **Output**: Generated tasks saved as separate JSON files per category in `/generated_tasks`
- **Logging**: Task generation logs saved to `task_generation.log`

## Task Generation Process

1. **URL Processing**:
   - Load URLs from categories dictionary
   - Simplify URLs to parent pages
   - Remove duplicates within each category
   - Filter by selected categories (if specified)

2. **Content Extraction**:
   - Use PageParserTool to extract page content
   - Capture screenshots using Playwright

3. **Two-Step VLM Process**:
   - **Step 1**: Extract comprehensive information about the website
   - **Step 2**: Generate 5 diverse tasks based on the extracted information

4. **Task Parsing**:
   - Parse line-format responses (e.g., "1. Task | Outcome | Difficulty")
   - Convert to structured JSON format
   - Add metadata (source URL, category)

5. **Output**:
   - Save tasks to separate JSON files per category with timestamps
   - Include task description, expected outcome, difficulty level, source URL, category

## Task Format

### VLM Response Format
The VLM generates tasks in this line format:
```
1. Navigate to the search section and search for "latest news" | Should display search results for latest news | Easy
2. Click on the first article in the featured section and read the headline | Should open the article page and display the headline | Easy
3. Find the contact information in the footer and extract the email address | Should locate and extract the contact email | Medium
4. Use the advanced search filters to find articles from the last week | Should apply date filter and show recent articles | Medium
5. Complete the newsletter subscription form with test data | Should fill out the form and submit successfully | Hard
```

### Final JSON Structure
Generated tasks follow this JSON structure:

```json
{
  "task_description": "Clear description of what to do",
  "expected_outcome": "What should happen when completed",
  "difficulty": "easy/medium/hard",
  "category": "category_name",
  "url": "source_url",
  "source_url": "original_url"
}
```

## Categories Supported

The system supports the following categories from the expanded URL dictionary:

- `news`: News websites and articles
- `shopping`: E-commerce and shopping sites
- `social`: Social media platforms
- `services`: Service-oriented websites
- `education`: Educational content and courses
- `tech`: Technology and software sites
- `entertainment`: Entertainment and media sites
- `travel`: Travel and tourism websites
- `health`: Health and medical information
- `food`: Food and recipe websites
- `academic`: Academic and research sites
- `government`: Government and official sites
- `finance`: Financial and investment sites

## Performance Considerations

- **Rate Limiting**: 2-second delay between URL processing to avoid overwhelming servers
- **Browser Management**: Headless Playwright mode for efficiency
- **Content Truncation**: Page content limited to 3000 characters for VLM processing
- **Parallel Processing**: Run multiple categories simultaneously in separate terminals
- **Screenshot Optimization**: Base64 encoding for efficient storage

## Troubleshooting

### Common Issues

1. **Playwright Initialization Failed**:
   - Check if Playwright is installed: `playwright install chromium`
   - Verify system has display capabilities for headless mode

2. **VLM Response Parsing Errors**:
   - Check VLM model availability and configuration
   - Verify API keys and network connectivity
   - Check response format matches expected line format

3. **Page Content Extraction Failed**:
   - Check if PageParserTool dependencies are available
   - Verify URL accessibility and network connectivity

4. **Task Parsing Issues**:
   - Verify VLM response follows the expected line format
   - Check for malformed responses or extra text

5. **Category Not Found**:
   - Use `python run_parallel_categories.py --all` to see available categories
   - Check spelling and case sensitivity

### Debug Mode

Enable debug logging by modifying the logging level in `generate_tasks.py`:

```python
logging.basicConfig(level=logging.DEBUG, ...)
```