# The Crowded Embedding Space: A Mean-Field Mechanism for Emergent Unfairness in Retrieval-Augmented Agents

This repository contains the code to reproduce all figures in the paper. The codebase is organized into two main directories:

- **`simulations/`**: Synthetic experiments validating theoretical predictions (Theorem 2.2)
- **`empirical/`**: Real-world dataset experiments across text, vision, and narrative domains

---

## Table of Contents

1. [Installation](#installation)
2. [Figure-to-Script Mapping](#figure-to-script-mapping)
3. [Quick Start](#quick-start)

---

## Installation

### Requirements

- Python 3.8+
- GPU optional (speeds up sentence-transformers encoding)

### Setup

```bash
# Extract the zip file
cd rag-mft

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt
```

---

## Figure-to-Script Mapping

| Figure | Description | Script |
|--------|-------------|--------|
| **Figure 2** | Phase Transition (Theory vs Empirical) | `simulations/main.py` |
| **Figure 3** | Reranking Cannot Rescue Geometric Collapse | `empirical/run_shortlist_effect.py` |
| **Figure 4a** | CIFAR-100 Visual Retrieval | `empirical/run_clip.py` |
| **Figure 4b** | Wikipedia Movie Plots (Film Noir) | `empirical/run_noir.py` |
| **Figure 4c** | 20 Newsgroups Topic Retrieval | `empirical/run_news.py`  |
| **Figure 4d** | Quora Question Pairs | `empirical/run_quora.py` |
| **Figure 5** | Metastable Collapse (Dynamic Marginalization) | `empirical/run_thm3_metastability_noir.py`  |
| **Figure 6a** | Geometric Intuition Visualization | `simulations/main.py` |
| **Figure 6b** | Shortlist Size Effect | `simulations/main.py`  |
| **Figure 6c** | Clustered vs Uniform (PPP Assumption) | `empirical/run_ppp_assumption.py` |

---

## Quick Start

### Reproduce All Figures

```bash
# From repository root

# 1. Synthetic experiments (Figures 2, 6a, 6b)
cd simulations
python main.py
cd ..

# 2. Empirical experiments (Figures 3, 4, 5, 6c)
cd empirical
python run_clip.py          # Figure 4a 
python run_noir.py          # Figure 4b 
python run_news.py          # Figure 4c 
python run_quora.py         # Figure 4d 
python run_shortlist_effect.py      # Figure 3 
python run_thm3_metastability_noir.py  # Figure 5 
python run_ppp_assumption.py        # Figure 6c 
```

---

## License

This code is provided for academic review and research purposes.