README.txt

This supplementary material accompanies the NeurIPS 2025 Datasets and Benchmarks submission
"Thousand Voices of Trauma: A Synthetic Dataset for Modeling Prolonged Exposure Therapy Conversations."

We include code for generating a small-scale example of the dataset and for scoring emotional intensity.
The full dataset generation and all prompt templates will be released upon acceptance.

--------------------------------------------------------------------------------
1. generate_example.py
--------------------------------------------------------------------------------
This script generates a limited batch of synthetic therapy conversations using one prompt
(Orientation to Imaginal Exposure). It uses realistic client and therapist profiles and
saves both the conversation and metadata to the sample_output/ folder.

- Uses AWS Bedrock to generate therapist-client conversations
- Demonstrates how profiles and prompts are structured
- Generates 2–3 conversations only for demonstration

Output:
    sample_output/sample_conversation.json
    sample_output/sample_metadata.json

To run:
    python generate_example.py

--------------------------------------------------------------------------------
2. gen_emotional_scores.py
--------------------------------------------------------------------------------
This script scores each generated conversation on emotional intensity (1–10) across
six PE protocol stages using a Bedrock LLM.

- Only one prompt is partially shown to preserve our full design
- The scoring model outputs a 6-number JSON array
- Demonstrates the annotation process using a transcript string

Output:
    sample_output/emotional_scores.csv

To run:
    python gen_emotional_scores.py

--------------------------------------------------------------------------------
3. utils.py
--------------------------------------------------------------------------------
A minimal wrapper to invoke the Bedrock API.

- Credentials are assumed to be configured under AWS CLI
- Sensitive model hyperparameters (e.g., maxTokens) are redacted for review
- The call_bedrock(prompt) function is used across scripts

No API keys are exposed in this file.

--------------------------------------------------------------------------------
Dependencies
--------------------------------------------------------------------------------
Install dependencies using:

    pip install boto3 pandas tqdm

Requires:
- AWS credentials with access to Bedrock
- Region: us-east-1
- Bedrock model: anthropic.claude-3-5-sonnet-20240620-v1:0 (or update accordingly)

--------------------------------------------------------------------------------
Notes
--------------------------------------------------------------------------------
- This code demonstrates the generation and scoring logic on a small batch of samples.
- We deliberately include only a subset of variables and prompts in this version.
- The full dataset, prompt suite (P5–P11), and configuration details will be released
  upon acceptance of the paper.

For any questions, please contact the authors.
