# Prompts to Proxies (P2P): Emulating Human Preferences via a Compact LLM Ensemble
![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)
![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)
![Status: Research Prototype](https://img.shields.io/badge/status-research--prototype-yellow)
![LLM: OpenAI Compatible](https://img.shields.io/badge/LLM-OpenAI--Compatible-lightgrey)

This code accompanies our ICLR-26 submission: Prompts to Proxies: Emulating Human Preferences via a Compact LLM Ensemble.
(Links will be added upon publication.) 

## Overview

**Prompts to Proxies (P2P)** is a modular system for simulating human-like survey responses using large language models (LLMs). It enables researchers to study behavioral alignment, agent variation, and response diversity across controlled synthetic populations.

The system constructs diverse agent profiles ("endowments")—each representing a distinct persona—and elicits survey responses from LLMs conditioned on those profiles. Through active sampling and entropy-based tracking, P2P iteratively refines the agent pool to represent a wide spectrum of beliefs, demographics, and ideologies. It uses variable selection regression methods (constrained lasso and constrained elastic net) to select proxy agents based on observed ground truth aggregate data. Inspired by revealed preference theory, the goal of P2P is to perform alignment in the latent preference space based on observational data and form a parsimonious representative LLM proxy ensemble for the unknown ground truth human population.

**Key capabilities:**
- Endowment generation via open-ended or attribute-driven persona creation
- Adaptive survey simulations using LLM agents (OpenAI, Anthropic—future updates, etc.)
- Diversity tracking based on entropy and response variability
- Binary alignment with real-world survey data for empirical validation

P2P is designed for researchers in both alignment and the social sciences. It supports modular extensions and can be applied to a wide range of scenarios, e.g., substituting humans in providing answers to proposed survey questions (good for survey extension and filling in missing data).

## Core Components
### Endowment Generator (`generators/`)

The Endowment Generator dynamically constructs agent profiles (endowments) to simulate a diverse respondent population. It uses an **active sampling loop** that prioritizes **response diversity** based on entropy.

**Core features:**
- Structured sampling from core, thematic, and theoretical attribute modes
- Active expansion of underrepresented perspectives via low-entropy question targeting
- Variability-aware sampling guided by entropy-based diversity scoring
- Parallelized agent simulation via `SurveyConductor`
- Export tools for endowment metadata, agent responses, and diagnostic plots

**Key components:**
- `ActiveEndowmentGenerator`: Orchestrates endowment sampling, survey simulation, and entropy-guided expansion.
- `ThemeVariabilityTracker`: Monitors response entropy and diversity across modes, guiding adaptive sampling.
- `EndowmentModel` interface (e.g., `OpenAIEndowmentModel`): Abstract base for generating persona descriptions; current implementation uses OpenAI's GPT models.

This module is the backbone of P2P’s adaptive proxy construction loop.  A companion Sphinx documentation will be released later when the codebase is launched publicly.

### Survey Conductor (`modules/survey_conductor.py`)

The `SurveyConductor` manages the survey simulation loop by coordinating endowments, agents, and survey questions. It handles LLM-based response elicitation, logging, and output formatting for downstream use.

**Core features:**
- Loads endowments and prompts agents to answer survey questions
- Supports OpenAI, Claude, and dummy agent backends via `AgentFactory`
- Streams responses into structured logs (JSONL + matrix format)
- Compatible with `ActiveEndowmentGenerator` for adaptive sampling
- Converts raw responses into agent-level records for analysis

**Key methods:**
- `run()`: Executes the full survey simulation and logs results
- `to_agent_records()`: Outputs responses in agent-mode format
- `save_to_csv()` and `save_to_json()`: Export utilities

This module is the runtime engine of survey execution, used across simulation, diagnostics, and regression pipelines.

### Endowment Manager (`modules/endowment_manager.py`)

The `EndowmentManager` module provides structured tools for managing a pool of synthetic agent profiles ("endowments"). These profiles condition LLM agents for simulating survey responses or approximating population distributions.

**Core features:**
- Load, validate, and save endowments from CSV files
- Supports role assignment (`proxy` vs. `ground_truth`) and weight initialization
- Allows partial weight updates and dynamic role reassignment
- `ActiveEndowments` subclass enables mode-aware expansion (e.g., by themes or theories)

**Key classes:**
- `Endowments`: Manages basic endowment metadata, roles, and weights
- `ActiveEndowments`: Extends functionality with attribute/mode grouping, supports adaptive experiments

Used by the `SurveyConductor` and `ActiveEndowmentGenerator` to initialize and track agent profiles.

### Survey Converter (`modules/survey_converter.py`)

The Survey Converter module provides a unified interface for loading, managing, and transforming survey questions from configurable CSV + YAML inputs.

**Core features:**
- Parses survey questions and answer mappings based on a YAML schema
- Supports flexible splitting into train/valid/test sets
- Formats questions into LLM-ready prompts
- Converts multi-option questions into binary format for alignment and analysis

**Key classes:**
- `Survey`: Loads questions, parses answer mappings, supports flexible splits and CSV I/O
- `BinaryExtendedSurvey`: Expands multi-choice questions into binary (1-vs-rest) format for consistency in modeling

Used by `SurveyConductor` and `ThemeVariabilityTracker` for prompt generation and response analysis.

### Response Converter (`modules/response_converter.py`)

The Response Converter manages the loading and transformation of survey responses into a matrix format for analysis. It aligns responses with survey metadata and supports binary expansion for multiclass questions.

**Core features:**
- Ingests agent responses from CSV or list-of-dicts format
- Converts answer text into numerical codes using `Survey` mappings
- Saves response matrices in question-by-agent CSV format
- Supports reverse mapping and binary expansion for multiclass inputs

**Key classes:**
- `Responses`: Loads and stores question-agent response matrices using code or answer format
- `BinaryExtendedResponses`: Transforms multiclass responses into multiple binary judgments aligned with `BinaryExtendedSurvey`

This module is essential for preparing data for downstream analysis and entropy-based tracking.

### Aggregate Response Loader (`modules/aggregate_responses.py`)

This utility class supports **empirical experiments** by converting aggregate human response data into a binary format that aligns with questions from a `BinaryExtendedSurvey`.

**Core features:**
- Loads aggregate population-level responses from a `dict` or `.json` file
- Converts categorical distributions into expected binary values for each question
- Aligns transformed values with the binary-expanded question IDs used in simulations

**Key class:**
- `AggregateResponses`: Converts `{qid: {label: proportion}}` mappings into `{binary_qid: expected value}` format, allowing direct comparison with agent outputs.

This class bridges empirical datasets and synthetic agent simulations, enabling benchmarking and model calibration.

### Attribute Learner (`modules/attribute_learner.py`)

The `AttributeLearner` is a language model-powered component that infers which human attributes (e.g., demographics, beliefs, ideologies) are likely to influence how individuals respond to survey questions.

It can operate at two levels:
- **Global**: Infers a list of attributes across the full training set of survey questions.
- **Local**: Infers attributes relevant to a single question (e.g., `"religiosity"`, `"trust in government"`).

This module is particularly useful in `ActiveEndowmentGenerator`, where it helps identify salient traits for conditioning new endowments. It can also be used standalone for survey diagnostics and attribute tagging.

#### Key Features
- Uses GPT-based LLMs to analyze survey question framing and semantics.
- Extracts structured attribute lists in Python syntax (e.g., `["age", "political ideology"]`).
- Supports attribute caps, logging, and safe parsing with fallback.

## Quick Start

### 1. Installation

Clone the repository and set up a virtual environment:

```bash
git clone https://github.com/anonymous/P2P.git # We will publicize the code on GitHub when the paper becomes public
cd P2P
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

Make sure you have:
- Python 3.10+
- OpenAI API key in your environment

### 2. Run the Full Pipeline

P2P provides a unified script to run both stages of the experiment:

- **Stage 1**: Active generation of synthetic agent endowments and survey responses.
- **Stage 2**: Lasso regression to identify which endowments predict response variation.

To run the full pipeline using a sample configuration:

```bash
python scripts/run_full_pipeline.py --config config/full_pipelines/example.yaml
```
This script will:

- Launch `run_endowment_generator.py` to generate agent profiles and LLM responses
- Create a `lasso_config.yaml` on the fly, linking outputs from the previous step
- Run `run_regression_experiment.py` using the generated config to perform lasso/elastic net selection
- Save all results, logs, and configuration copies into a timestamped subdirectory under `outputs/`

### 3. Example Output

Upon completion, you will find:

- `outputs/ATP_w36_<timestamp>/responses.csv`: Matrix of survey responses  
- `outputs/ATP_w36_<timestamp>/endowments.csv`: Generated endowment metadata  
- `outputs/ATP_w36_<timestamp>/plots/*.png`: Diagnostic plots (e.g., entropy, diversity)  
- `outputs/ATP_w36_<timestamp>/reports/*.html`: Optional summary reports  
- `outputs/ATP_w36_<timestamp>/pipeline.log`: Execution log for debugging

To customize experiments, modify:

- The full pipeline config: `config/full_pipelines/example.yaml`  
- Component configs under:  
  - `config/surveys/`  
  - `config/attribute_banks/`  
  - `config/experiments/`

To run modular stages separately (e.g., for debugging or custom workflows), you may use:

- `scripts/run_endowment_generator.py`: For generating agent endowments and synthetic responses  
- `scripts/run_lasso_experiment.py`: For running the Lasso regression on generated data

### Full Pipeline Config Example

Below is a minimal example excerpt from a full pipeline configuration file (`config/full_pipelines/example.yaml`):

```yaml
metadata:
  name: "ATP_w36"
  description: "Generate 300 active endowments for ATP Wave 36 using GPT-4-o agents."
  seed: 101

paths:
  survey_csv: "data/w36/info.csv"
  survey_yaml: "config/surveys/american_trend_panel.yaml"
  attribute_bank: "config/attribute_banks/attribute_bank.yaml"
  aggregate_json: "data/w36/W36_aggregate_results.json"

generation:
  target_n: 300
  initial_n: 10
  num_update_steps: 10
  parallel: true
  max_workers: 30
```
See [`example.yaml`](config/full_pipelines/example.yaml) for the full configuration and documentation.

> **Note:** To run the endowment generation pipeline you will need a valid OpenAI API key. Before running the python scripts, in the terminal, you need to export your key:
```bash
export OPENAI_API_KEY=your-api-key-here
```
> At this time, our system only supports OpenAI as the backend. We will extend the agent module to support other backends in future updates.


## Paper Results
To check the results used in the paper, kindly follow the instructions below:
1. Navigate to the `outputs/` directory.
2. For the empirical results (Section 4 & Appendix B), refer to:
  - `outputs/panel_study/multi_round/wave_W42_repeat_1/` for the specific run featured in the main paper.
  - `outputs/panel_study/multi_round/` for the full panel study across 14 ATP waves.
3. For simulation results (Appendix B), refer to:
  - `outputs/round_sweep/` for Simulation Study 1.
  - `outputs/custom_sweep_new/` for Simulation Study 2.
  - `outputs/entropy_sweep/`, `outputs/entropy_sweep_low_gt/`, and `outputs/entropy_sweep_mid_gt/` for Simulation Study 3. The three folders correspond to using high entropy, low entropy, and mid entropy modes for the construction of the ground truth agent population.

To reproduce key figures in the paper and appendices, refer to the `.ipynb` files under the `notebooks/` directory:
  - `notebooks/main_paper_plots.ipynb` for the main figures related to the specific empirical experiment featured in the main paper.
  - `notebooks/panel_study_plots.ipynb` for the figure used in the panel study in the Appendices.
  - `notebooks/simulation_study_*.ipynb` for the figures used in simulation studies.
  - Experiment snapshots are provided under the `outputs/` directory. Please navigate to the relevant `outputs/*_plots` directories and to `outputs/fraction_sweep` for snapshots related to Simulation Study 1.

### Reproducing Experiment Results

Results for the simulation studies are reproducible by running the following code:
```bash
# --- Simulation Study 1 ---
python scripts/run_multi_round_sweep.py --config config/simulations/run_multi_round_sweep.yaml

# --- Simulation Study 2 ---
python scripts/run_customized_multi_round_sweep.py --config config/simulations/run_customized_multi_round_sweep.yaml

# --- Simulation Study 3 ---

# High-entropy modes as ground truth
python scripts/run_multi_round_entropy_sweep.py --config config/simulations/run_multi_round_entropy_sweep.yaml

# Low-entropy modes as ground truth
python scripts/run_customized_entropy_sweep.py --config config/simulations/run_customized_entropy_sweep.yaml

# Mid-entropy modes as ground truth
python scripts/run_customized_entropy_sweep.py --config config/simulations/run_customized_entropy_sweep_mid.yaml


```
For the simulation studies, we use the endowments and responses generated in a dedicated W42 run `outputs/ATP_w42_20250726_181405/` (with 10 update steps).

The regression results for the empirical experiments are reproducible by running the `run_regression_experiment.py` using `regression_config.yaml` for each experiment run based on the endowments and responses. Please note that since OpenAI API does not support full seed control, regenerating endowments and responses may produce different outputs across runs. To stay within the 100MB supplementary material limit, we retain only the endowments and responses for wave W42 across the three repeats. Interested readers may contact the authors to obtain the complete set of generated endowments and responses.

Interested readers can run the following code to re-run the full panel study using the following command (note: this process takes several hours to complete):
```bash
# --- Panel Study ---
export OPENAI_API_KEY=your-api-key-here
python scripts/run_panel_sweep.py --config config/panels/panel_study.yaml
```
> **Note:** Reproducing the results incurs moderate API usage. Depending on your OpenAI rate limit, you may want to reduce the `max_workers` setting under the `generation` block in the yaml file to avoid throttling. Lowering this value may increase the total runtime.


## Project Structure

```text
code/
├── agents/                 # Agent wrappers for interfacing with LLM APIs (OpenAI, Anthropic, etc.)
├── config/                 # YAML configuration files for controlling all pipeline stages
│   ├── attribute_banks/      # Thematic attribute templates used to construct agent persona endowments
│   ├── endowments/           # Endowment generation configurations
│   ├── experiments/          # Lasso experiment configurations
│   ├── full_pipelines/       # Unified configurations for full pipeline runs
│   └── surveys/              # Survey schema definitions for parsing raw CSVs
├── data/                   # Survey info data and aggregate data
├── experiments/            # Experiment runner logic and analysis utilities
├── generators/             # Endowment generator and variability tracking system
├── modules/                # Core functional modules (survey conductor, converters, manager classes)
├── notebooks/              # Interactive notebooks for plots
├── outputs/                # Outputs from script runs (endowments, responses, diagnostics, plots)
├── scripts/                # Entry-point scripts for launching generators, pipelines, and regressions
├── styles/                 # CSS stylesheets for HTML diagnostics and summary reports
├── .env                    # Environment variable overrides (e.g., API keys)
├── README.md               # This file
├── LICENSE                 # MIT License specifying terms of use and distribution  
└── requirement.txt         # Project dependencies (auto-generated)
```
## License

Released under the [MIT License](LICENSE). The author name has been redacted to preserve the anonymity of the submission.