# Behavioral Fingerprinting of Large Language Models

A reproducible framework to build multi-dimensional "behavioral fingerprints" of LLMs using a diagnostic prompt suite and an automated evaluator. The pipeline collects model responses, scores them against detailed rubrics via a separate evaluator model, and generates visual summaries.

## Highlights
- Diagnostic Prompt Suite across reasoning, world model, bias, personality, and robustness
- Automated evaluation using a strong LLM as an impartial judge (JSON outputs)
- Visualizations: radar profiles and category comparison charts
- Narrative reports summarizing each model's qualitative fingerprint
- Fully file-based artifacts checked into the repo (`results/`, `evaluations/`, `charts/`, `reports/`)

## Repository structure
- `src/` — scripts to run the end-to-end pipeline
  - `run_experiment.py` — parse prompts and collect model responses into `results/`
  - `run_evaluation.py` — construct meta-prompts with rubrics and score into `evaluations/`
  - `visualize_results.py` — aggregate scores, generate charts in `charts/`, and write per-model reports in `reports/`
  - `requirements.txt` — Python dependencies
- `AI-comm-records/` — LaTeX records of the prompt suite and evaluation protocol, plus cached `prompts.json`
  - `prompt_suite.tex`, `evaluation_protocol.tex`, `idea.tex`, `prompts.json`
- `results/` — raw model responses (per model directory, per prompt `.txt`)
- `evaluations/` — evaluator JSON outputs mirroring `results/` prompt IDs
- `charts/` — to be generated figures (radar and comparisons)
- `reports/` — generated narrative reports (one per model)

## Installation
1. Python 3.10+
2. Create a virtual environment and install dependencies:
```bash
python -m venv .venv && source .venv/bin/activate
pip install -r src/requirements.txt
```
3. Configure environment for OpenRouter (used for both target models and evaluator):
- Create a `.env` file at the repo root with:
```
OPENROUTER_API_KEY=your_key_here
```

Note: If no key is present, scripts run in simulation mode and still write placeholder outputs so the pipeline can be exercised end-to-end.

## Usage
### 1) Collect model responses
Edit `TARGET_MODELS` in `src/run_experiment.py` to include the OpenRouter identifiers you wish to evaluate, then run:
```bash
python src/run_experiment.py
```
- Prompts are read from `AI-comm-records/prompts.json` (cached) or parsed from `AI-comm-records/prompt_suite.tex` on first run.
- Outputs are written per model into `results/<provider>/<model>/<prompt_id>.txt` or `results/<model_id>/<prompt_id>.txt` depending on your choice of naming. The current repo uses flat model IDs like `results/openai/gpt-5/`.

### 2) Score responses with evaluator
Set the `TARGET_MODELS` list in `src/run_evaluation.py` to match the result folders you want scored. Optionally set `EVALUATOR_MODEL`.
```bash
python src/run_evaluation.py
```
- Produces JSON files in `evaluations/<provider>/<model>/<prompt_id>.json`.
- Robustness pairs (e.g., `4.1.1A/B`) are evaluated jointly and saved as `4.1.1.json`, `4.1.2.json`.

### 3) Aggregate, visualize, and report
```bash
python src/visualize_results.py
```
- Aggregates numeric scores, normalizes by category maxima, and emits:
  - Radar charts per model in `charts/` (e.g., `gpt-5_radar.png`)
  - Comparison bar charts per category in `charts/large/` or `charts/mid/`
  - Narrative reports per model in `reports/` (e.g., `gpt-5_report.txt`)

## Example artifacts
- Radar: `charts/gpt-5_radar.png`, `charts/gemini-2.5-pro_radar.png`
- Comparisons: `charts/large/Robustness_comparison.png` or `charts/mid/Causal_Chain_comparison.png`
- Reports: `reports/gpt-5_report.txt`, `reports/claude-opus-4.1_report.txt`

## Prompt suite and evaluation protocol
- Prompts defined in `AI-comm-records/prompt_suite.tex` (cached JSON in `AI-comm-records/prompts.json`).
- Rubrics and procedures in `AI-comm-records/evaluation_protocol.tex`.
- Research narrative and scoping in `AI-comm-records/idea.tex`, `discussion_points.tex`, and `literature_review.tex`.

## Notes and tips
- Model identifiers: scripts assume OpenRouter-style IDs (e.g., `openai/gpt-5`). Adjust paths or names consistently if you change the layout.
- Simulation mode: without an API key, the system writes placeholder responses/evaluations so you can test downstream steps.
- Personality classification prompts (3.3.x) yield non-numeric scores (e.g., E/I/S/N). Visualization code treats these separately and excludes them from numeric averages.

## Citing

