
# Prompt Optimization Across Multiple Agents for Representing Diverse Human Populations

![Project Overview](overview.png)
## Project structure
```
├── code
│   ├── agents.py
│   ├── main.py
│   ├── methods
│   │   ├── reppopdemo.py
│   │   ├── reppopmapped.py
│   │   ├── kmedoids.py
│   │   ├── random.py
│   │   ├── samplegreedy.py
│   │   └── single.py
│   ├── optimization.py
│   ├── preprocess
│   ├── prompts.py
│   ├── utils.py
│   └── visualize
├── data
│   ├── eedi
│   ├── opinionqa
│   └── wikiarts
├── experiments
├── output
├── overview.png
├── preprocessed_data
├── README.md
└── requirements.txt
```

## 0. Install requirements
```
pip install -r requirements.txt
```


## 1. Preprocess datasets

**[EEDI]** Sample real students and generate their embeddings:
```
python -m code.preprocess.eedi\
    --n_humans 50\
    --n_questions 40\
    --data_path data/eedi\
    --output_dir preprocessed_data/eedi\
    --random_seed 1
```

**[OpinionQA]** Sample real humans and generate their embeddings:

```
python -m code.preprocess.opinionqa\
    --n_humans 500\
    --data_path data/opinionqa/American_Trends_Panel_W92\
    --output_dir preprocessed_data/opinionqa\
    --random_seed 1
```

**[Wikiarts]** Create LLM annotators and their embeddings:
```
python -m code.preprocess.wikiarts\
    --n_humans 100\
    --n_questions 20\
    --syn_human_model google/gemma-3-27b-it\
    --emb_model  google/gemma-3-12b-it\
    --output_dir preprocessed_data/wikiarts\
    --random_seed 1
```

## 2. Run methods
Select one domain to run: `wikiarts`, `opinionqa`, or `eedi`. 

Choose a method to run: `single`, `random`, `kmedoids`, `samplegreedy`, `reppopdemo`, `reppopmapped_one`, or `reppopmapped_two`.

Use `euclidean` distance for Wikiarts and OpinionQA domains, and `manhattan` distance for EEDI domain.

Choose a context size K, e.g., K=3.

```
python -m code.main \
    --domain {domain}\
    --distance {distance}\
    --method {method} \
    --n_agents 10 \
    --model google/gemma-3-12b-it \
    --k_examples {K} \
    --data_dir preprocessed_data/{domain} \
    --output_dir experiments/{domain} \
    --seed 1
```

Constructed agents by each method can be found in `experiments/{domain}`

## 3. Visualize results

```
python -m code.visualize.main\
    --domain {domain}\
    --data_dir preprocessed_data/{domain}\
    --output_dir experiments/{domain}\
    --model google/gemma-3-12b-it\
    --distance {distance}\
    --dataset_split train\
    --methods_to_plot {method_1} {method_2} {method_3}
```
Plots can be found in `experiments/{domain}`