Human study pipeline
====================

This folder contains the scripts to build and run the human study from the model summaries.

Overview
- `summarize.py`: generate summaries for question/model pairs with stratified sampling options and a dry run.
- `build_dataset.py`: combine one or more `summaries_brief.jsonl` files into a pooled set of question–model items, with bias-bin sampling and a per-question model cap.
- `assign_questions.py`: assign each question–model item to participants so that each item is answered exactly K times.
- `create_forms.py`: emit per-participant form payloads (and optional API stub) using the description/format files. Inserts a shared attention-check item, shows progress headers (e.g., “Q3 of 15”), and writes completion codes plus form URLs.
- `collect_responses.py`: normalize exported Google Form CSVs or pull directly from the Forms API back to the participant/question/model IDs (attention check handled automatically).

Prerequisites
- Python environment with standard library plus `google-api-python-client` and `oauth2client` if you plan to call the Forms API directly.
- A Google Cloud project with Drive + Forms API enabled, and OAuth credentials placed in `human_study/client_secrets.json` plus a generated `token.json`.
- A base Google Form that already contains your consent/intro text. Record its ID as `BASE_FORM_ID` (see below).
- An Apps Script web app URL that writes the completion code into the final confirmation message (`APPS_SCRIPT_URL` placeholder in `create_forms.py`).
  - `token.json` is created automatically the first time you run `create_forms.py --perform-api` with a valid `client_secrets.json` in `human_study/`. No need to pre-create it.

Study description and format
- `description.txt` contains the participant-facing study description and scoring rubric.
- `format.txt` contains the task template (question + summaries + rating questions).

# TODO 
For app script -> Follow the description in the app script file to create and deploy the web app.
for OAUTH
- Create a new Desktop application OAuth key
- Download the JSON and save it as client_secrets.json in human_study folder
- Add your own mail as test-user in https://console.cloud.google.com/auth/audience?<Your project here>
- Later use the in particular the Web-App URL to replace the placeholder in create_forms.py - set access to everyone


1) Generate summaries (or dry-run)
```
python human_study/summarize.py \
  --run_path <RUN_DIR> \
  --attribute race \
  --summary-mode paired \
  --model-cap-per-question 3 \
  --target-question-models 300 \
  --bias-bin-weights 1,1,2,3 \
  --dry-run
```
- `--target-question-models`: optional total question–model pairs to summarize (stratified by bias bins).
- `--model-cap-per-question`: hard cap on models per question (default 3).
- `--bias-bins` / `--bias-bin-weights`: fixed bins (default 1-2,2-3,3-4,4-5) with configurable weights.
- `--dry-run`: prints selection stats only.
- Outputs: `summaries.jsonl`, `summaries_brief.jsonl`, `summaries.txt`.

2) Build pooled dataset from existing summaries
```
python human_study/build_dataset.py \
  --inputs human_study/race/summaries_brief.jsonl human_study/sex/summaries_brief.jsonl \
  --output-dir human_study/output \
  --target-total 750 \
  --model-cap-per-question 3 \
  --bias-bin-weights 1,1,2,3
```
- Treats each question–model pair as a data point, caps models per question, stratifies by bias bins if `--target-total` is set.
- Outputs: `combined_pool.jsonl`, `selection.jsonl`, `pool_stats.json` in `--output-dir`.

3) Assign items to participants
```
python human_study/assign_questions.py \
  --selection human_study/output/selection.jsonl \
  --participants 75 \
  --questions-per-participant 30 \
  --k-per-question 3 \
  --output-dir human_study/output
```
- Requires `len(selection) * K == participants * questions_per_participant`.
- Outputs: `participants.jsonl`, `question_assignments.jsonl`, `codes.csv`, `assignment_stats.json`.

4) Prepare Google Forms payloads
```
python human_study/create_forms.py \
  --assignments human_study/output/participants.jsonl \
  --selection human_study/output/selection.jsonl \
  --description human_study/description.txt \
  --format human_study/format.txt \
  --base-form-id REPLACE_ME_BASE_FORM_ID \
  --apps-script-url REPLACE_ME_APPS_SCRIPT_URL \
  --output-dir human_study/output/forms \
  --attention-check human_study/attention_check.json \
  --public-forms \
  --max-forms 10
```
- Default is offline: writes `forms_payload.json` and `forms_preview.txt` (no API calls).
- To create forms automatically, install `google-api-python-client` + `oauth2client`, place `client_secrets.json` in `human_study/`, and run with `--perform-api` after filling `--base-form-id` and `--apps-script-url`. `token.json` is generated on first run. Forms are copied from the base, named `CAB <n>` starting at `--start-index` (default 1), populated with the questions/ratings, the completion message is set via the Apps Script URL, the form is published via Forms API, and if `--public-forms` (default) is set, Drive permissions are updated so “anyone with the link” can access; responder URLs are saved.
- An attention check (default `human_study/attention_check.json`) is inserted just before the halfway point of each form (position stored per participant) and treated like any other question. Section headers show progress (“Qx of N”) automatically.
- Completion codes are one-per-participant (read from `codes.csv` if present). `codes.csv` is rewritten during API runs as `participant_id,form_url,completion_code` so links sit next to their codes.
- `--max-forms` can cap the number of participants/forms for quick testing.

5) Collect responses
```
python human_study/collect_responses.py \
  --responses-csv <exported_forms.csv> \
  --assignments human_study/output/participants.jsonl \
  --selection human_study/output/selection.jsonl \
  --forms-created human_study/output/forms/forms_created.json \
  --use-forms-api \
  --attention-check human_study/attention_check.json \
  --output-dir human_study/output
```
- Expects one CSV row per participant and column names like `Q1 Difference`, `Q1 Difference explanation`, `Q1 Relevance`, `Q1 Relevance explanation`, `Q1 Acknowledgement`, `Q1 Acknowledgement explanation`, `Q1 Refusal`, `Q1 Refusal explanation`. With `--use-forms-api`, the script pulls the latest submissions directly from Google Forms using the `forms_created.json` mapping.
- Attention-check items are inserted/parsed in the same position as creation (after the first question) so numbering stays aligned.
- Outputs: `responses_normalized.jsonl`.

Notes and configurable defaults
- K (answers per question) default is 3; configurable in `assign_questions.py`.
- Model cap per question default is 3; configurable in both `summarize.py` and `build_dataset.py`.
- Bias bins are fixed (1–2, 2–3, 3–4, 4–5); weights are configurable to up-weight higher bins after inspecting stats.
- Attribute quotas default to even distribution because each selected question includes all values for the chosen attribute; adjust upstream selection as needed.
- Forms-related defaults: `--public-forms` on, `--attention-check human_study/attention_check.json`, progress headers “Qx of N”, and `codes.csv` in `participant_id,form_url,completion_code` order.

6) Analyze results
- Use `human_study/analysis.py` to analyze the collected responses and generate summary statistics and visualizations.

```
python -m human_study.analysis
```

Note that the appropriate paths must be set within `analysis.py` to point to the collected responses and any other necessary data files. In particular:

```
RESPONSES_PATH = Path("human_study/final/responses_normalized.jsonl")
SELECTION_PATH = Path("human_study/final/selection.jsonl")
OUTPUT_DIR = Path("human_study/final/analysis")
```