# Self-Contained Email JSONL Format Documentation

This document describes the self-contained JSONL format used for loading both users and email messages into the DrBench email system.

## Overview

Email environment data is loaded from self-contained JSONL (JSON Lines) files where each line contains a complete JSON object representing either a user definition or an email message. This format allows for:

- **Self-contained scenarios**: Each file includes all users and emails needed
- **Conflict-free loading**: Multiple files can be loaded without user conflicts
- **Bulk operations**: Support for dozens of environment files per task
- **Validation**: Built-in validation ensures data integrity

## Record Types

Self-contained JSONL files contain two types of records identified by the `type` field:

### User Records

User records define email accounts that will be created in the system.

```json
{
  "type": "user",
  "username": "alice.smith",
  "first_name": "Alice",
  "last_name": "Smith", 
  "email": "alice.smith@company.com",
  "password": "alice_pwd"
}
```

#### User Record Fields

**Required Fields:**
- **`type`** (string): Must be `"user"`
- **`username`** (string): Unique username for email login (without @domain)
- **`email`** (string): Full email address (username@domain.com)
- **`password`** (string): Plain text password for email authentication

**Optional Fields:**
- **`first_name`** (string): User's first name for display purposes
- **`last_name`** (string): User's last name for display purposes

### Email Records

Email records define email messages that will be created in user mailboxes.

```json
{
  "type": "email",
  "id": "email_001",
  "from": "alice.smith@company.com",
  "from_name": "Alice Smith",
  "to": ["bob.jones@company.com"],
  "cc": ["charlie.brown@company.com"],
  "subject": "Project Kickoff Meeting",
  "date": "2025-01-20T09:00:00-05:00",
  "body": "Hi Bob,\n\nI'd like to schedule our project kickoff meeting...",
  "folder": "inbox",
  "read": false,
  "attachments": []
}
```

#### Email Record Fields

**Required Fields:**
- **`type`** (string): Must be `"email"`
- **`id`** (string): Unique identifier for the email message
- **`from`** (string): Sender's email address
- **`to`** (array of strings): List of recipient email addresses
- **`subject`** (string): Email subject line
- **`body`** (string): Email body content (use `\n` for line breaks)

**Optional Fields:**
- **`from_name`** (string): Display name of the sender
- **`cc`** (array of strings): List of CC recipient email addresses
- **`date`** (string): ISO 8601 formatted timestamp
- **`folder`** (string): Target folder (currently only "inbox" supported)
- **`read`** (boolean): Read status (for documentation only)
- **`attachments`** (array): Attachment metadata (see below)

#### Attachment Format

```json
{
  "filename": "document.pdf",
  "content_type": "application/pdf",
  "size": 102400
}
```

**Note**: Attachments are metadata only - actual file content is not stored.

## JSONL File Structure

Each line in the file is a complete JSON object. Users should be defined before emails that reference them:

```jsonl
{"type": "user", "username": "alice.smith", "email": "alice.smith@company.com", "password": "alice_pwd", "first_name": "Alice", "last_name": "Smith"}
{"type": "user", "username": "bob.jones", "email": "bob.jones@company.com", "password": "bob_pwd", "first_name": "Bob", "last_name": "Jones"}
{"type": "email", "id": "email_001", "from": "alice.smith@company.com", "from_name": "Alice Smith", "to": ["bob.jones@company.com"], "subject": "Hello Bob", "body": "Hi Bob!\n\nHow are you doing?"}
{"type": "email", "id": "email_002", "from": "bob.jones@company.com", "from_name": "Bob Jones", "to": ["alice.smith@company.com"], "subject": "Re: Hello Bob", "body": "Hi Alice!\n\nI'm doing great, thanks!"}
```

## Validation

Use the validation script to check file integrity before loading:

```bash
python src/scripts/validate_email_jsonl.py your_email_file.jsonl
```

The validator checks:
- JSON syntax and structure
- Required fields presence
- Email address format
- User/email consistency
- Date format validity

## Usage in Tasks

### Task Configuration

Include self-contained email files in task `env_files`:

```json
{
  "env_files": [
    {
      "app": "email",
      "source": "company_scenario.jsonl"
    },
    {
      "app": "email", 
      "source": "project_emails.jsonl"
    }
  ]
}
```

### Loading Process

1. Files are validated and copied to the email data directory
2. Users are created/verified (idempotent - no conflicts if users exist)
3. Email messages are created in appropriate mailboxes
4. Permissions are set and Dovecot is restarted

## Key Benefits

### Self-Contained Design
- Each file includes all necessary users and emails
- No external dependencies or shared state
- Easy to version control and distribute

### Conflict Resolution
- Duplicate users across files are handled gracefully
- Existing users are not overwritten
- Multiple files can safely load the same users

### Scalability
- Designed for loading dozens of files per task
- Efficient processing with minimal overhead
- Bulk operations for better performance

### Data Integrity
- Built-in validation prevents malformed data
- Email-to-user mapping is verified
- Consistent error handling and reporting

## Example Scenarios

### Small Team Project
```jsonl
{"type": "user", "username": "manager", "email": "manager@company.com", "password": "mgr_pwd", "first_name": "Project", "last_name": "Manager"}
{"type": "user", "username": "dev1", "email": "dev1@company.com", "password": "dev1_pwd", "first_name": "Developer", "last_name": "One"}
{"type": "user", "username": "dev2", "email": "dev2@company.com", "password": "dev2_pwd", "first_name": "Developer", "last_name": "Two"}
{"type": "email", "id": "kickoff", "from": "manager@company.com", "to": ["dev1@company.com", "dev2@company.com"], "subject": "Project Kickoff", "body": "Welcome to the team!"}
```

### Customer Support Thread
```jsonl
{"type": "user", "username": "customer", "email": "customer@external.com", "password": "cust_pwd", "first_name": "John", "last_name": "Customer"}
{"type": "user", "username": "support", "email": "support@company.com", "password": "supp_pwd", "first_name": "Support", "last_name": "Agent"}
{"type": "email", "id": "ticket_001", "from": "customer@external.com", "to": ["support@company.com"], "subject": "Login Issue", "body": "I can't log into my account."}
{"type": "email", "id": "reply_001", "from": "support@company.com", "to": ["customer@external.com"], "subject": "Re: Login Issue", "body": "I'd be happy to help. Can you provide your username?"}
```

## Best Practices

### File Organization
- Use descriptive filenames that indicate the scenario
- Group related emails in the same file
- Keep files focused on a single use case or story

### User Management
- Define all users before their first email reference
- Use consistent username patterns (e.g., first.last)
- Include meaningful first/last names for readability

### Email Content
- Write realistic email content for better scenarios
- Use proper email threading with Re: and Fwd: prefixes
- Include reasonable timestamps that tell a story

### Validation
- Always validate files before committing
- Test loading in development environment
- Use the validation script in CI/CD pipelines

## Troubleshooting

### Common Issues

1. **Validation Errors**: Use the validation script to identify issues
2. **Missing Users**: Ensure all email addresses have corresponding user records
3. **Date Format**: Use ISO 8601 format (YYYY-MM-DDTHH:MM:SS±HH:MM)
4. **Encoding**: Ensure files are UTF-8 encoded
