# Submit Data - Human-AI Conversation Datasets

This directory contains CSV files from human conversation experiments organized around controversial topics. The data represents conversations between participants discussing various political, social, and belief-based topics.

## Directory Structure

```
submit_data/
├── depth/                    # Phase 1: Depth Topics (fewer topics, more conversations each)
│   ├── [topic_name]/        # Individual topic folders
│   │   ├── *.csv           # Conversation data files
│   │   └── ...
│   └── ...
├── breadth/                 # Phase 2: Breadth Topics (many topics, fewer conversations each)
│   ├── [topic_name]/       # Individual topic folders
│   │   ├── *.csv          # Conversation data files
│   │   └── ...
│   └── ...
└── README.md               # This file
```

## Data Organization

### Depth vs Breadth

- **Depth Topics**: Focused exploration of a smaller set of topics with multiple conversation sessions per topic
- **Breadth Topics**: Broad coverage across many different topics with fewer sessions per topic

### Topic Categories

The conversations cover a wide range of controversial and opinion-based topics including:

#### Depth Topics (Phase 1)
- For more information on topics, check Appendix.

#### Breadth Topics (Phase 2)
- For more information on topics, check Appendix.

## File Naming Convention

Each CSV file follows this naming pattern:
```
YYYYMMDD_HHMMSS_TOPIC_NAME_UNIQUE_ID_VERSION.csv
```

Where:
- `YYYYMMDD`: Date (Year/Month/Day)
- `HHMMSS`: Time (Hour/Minute/Second)
- `TOPIC_NAME`: Underscored topic description
- `UNIQUE_ID`: 26-character unique identifier
- `VERSION`: Version number (e.g., "0.0.1")

## Data File Structure

Each CSV file contains conversation data with the following key columns:

- **Event tracking**: `event_order`, `event_type`
- **Participants**: `worker_id`, `sender_id`, `recipient_id`
- **Content**: `text` (messages, opinions, slider values)
- **Conversation flow**: `chat_round_order`, `message_id`
- **User interaction**: `is_slider_changed` (opinion rating changes)

### Event Types
- `Initial Opinion`: Participant's starting position on the topic
- `tweet`: Short messages during conversation
- `message_sent`/`message_received`: Direct messages between participants

### Special Notation
- `[SLIDER_VALUE=X]`: Indicates participant's opinion rating (typically 1-5 scale)
- `[AUTOSUBMISSION DUE TO TIME LIMIT]`: System-generated due to timeout

## Data Usage

This dataset is suitable for research on:
- Opinion dynamics and persuasion
- Human-AI conversation patterns
- Political and social belief systems
- Argumentation and debate analysis
- Consensus building in controversial topics


## Data Quality

- Files contain real human conversation data
- Some conversations may be incomplete due to participant dropout
- Time limits may have caused automatic submissions
- **Processed data may contain empty rows**: Consecutive messages from the same user are concatenated and treated as a single message, which can result in empty rows in the processed dataset