# Adaptive Query with AI Persona

A framework for **Bayesian adaptive querying** using persona-induced latent variable models. Given a set of AI personas with known response distributions, this system selects which questions to ask a user in order to best predict their responses to held-out target questions.

## Overview

### Problem Setting

- **Personas**: A set of n AI personas, each with a probability distribution over K responses for each question
- **User**: A real user whose responses follow an unknown mixture of personas
- **Goal**: Ask a sequence of questions (within a budget) to predict the user's responses to target questions

### Key Insight

By observing a user's answers, we can infer which persona(s) they resemble, then use those personas to predict unobserved responses.

## Features

### Querying Methods

| Method | Description | Use Case |
|--------|-------------|----------|
| **Greedy** | Myopically minimize expected posterior cost | Best single-step performance |
| **Random** | Uniform random question selection | Baseline comparison |
| **Non-adaptive** | Fixed question set for all users | When personalization isn't possible |
| **Full** | Use all available feasible questions | Reference with full information |
| **CAT** | Graded Response Model + adaptive selection | Standard psychometric approach |

### Advanced Features

- **Empirical Bayes Prior Learning**: Learn prior over personas from training data
- **Persona Clustering**: Reduce dimensionality by clustering similar personas
- **Temperature Scaling**: Calibrate prediction confidence
- **JIT Compilation**: Numba-accelerated core functions
- **Parallel Processing**: Joblib parallelization for batch evaluation

## Installation

```bash
# Install dependencies with uv
uv sync

# Or with pip
pip install -e .
```

### Dependencies

- Python 3.13+
- NumPy, Pandas, SciPy
- Numba (JIT compilation)
- Joblib (parallelization)
- tqdm (progress bars)
- scikit-learn (for PCA, clustering)

## Quick Start

### Running Experiments

```bash
# Run real data experiment with default config
uv run adaptive-query/main.py

# Run with custom configuration
uv run adaptive-query/main.py --config experiments/custom.yaml

# Run synthetic experiments
uv run adaptive-query-synthetic/main.py
```

## Configuration

See `adaptive-query/config.yaml` for all options:

```yaml
dataset:
  name: "WorldValuesBench"    # Dataset name
  n_categories: 4             # K response categories

budget: 20                    # Max questions per user

empirical_bayes:
  enabled: true               # Learn prior from training data

clustering:
  enabled: false              # Cluster personas into prototypes
  n_clusters: null            # null = auto-select

methods:
  greedy: true
  random: true
  cat: true
  full: true
```

## Output

Experiments save comprehensive results:

```
output/{experiment_id}/
├── config.yaml              # Configuration used
├── summary.txt              # Human-readable summary
├── summary.csv              # Method comparison table
├── detailed/                # Per-user results
│   ├── greedy.json
│   ├── random.json
│   └── ...
├── analysis/                # Analysis tables
│   ├── question_frequency_*.csv
│   ├── performance_by_budget.csv
│   └── ...
└── figures/                 # Visualizations
    ├── metrics_comparison.pdf
    ├── performance_by_budget.pdf
    └── ...
```

## Data Format

### Persona Responses

```python
# DataFrame with probability distributions
persona_responses = pd.DataFrame({
    'q1': [[0.1, 0.2, 0.3, 0.4], [0.25, 0.25, 0.25, 0.25], ...],
    'q2': [[0.5, 0.3, 0.15, 0.05], [0.1, 0.2, 0.3, 0.4], ...],
    ...
}, index=['persona_0', 'persona_1', ...])
```

### User Responses

```python
# DataFrame with integer response indices (-1 = missing)
user_responses = pd.DataFrame({
    'q1': [2, 0, -1, 3, ...],  # -1 means missing/unanswered
    'q2': [1, 1, 2, -1, ...],
    ...
}, index=['user_0', 'user_1', ...])
```
