# Reasoning Analysis Pipeline

A complete pipeline for analyzing reasoning patterns in large language models across multiple math reasoning datasets.

## Overview

This pipeline downloads datasets, generates reasoning responses from LLMs, analyzes failure patterns, clusters reasoning steps, and performs statistical significance testing.

## Available Scripts

### 1. download_datasets.py
Downloads GSM8K, ASDiv, and SVAMP datasets (1000 samples each).

### 2. generate_responses.py
Generates step-by-step reasoning responses from OpenAI models (gpt-4o-mini, gpt-3.5-turbo-1106).
**Note:** Requires `OPENAI_API_KEY` environment variable.

### 3. failure_statistics.py
Computes accuracy statistics for each model-dataset combination.

### 4. cluster_per_combination.py
Clusters reasoning sentences separately for each model-dataset combination using HDBSCAN.

### 5. analyze_reasoning_failures.py
Identifies the first error in each failed reasoning trace and categorizes error types.

### 6. export_combined_table_percentages.py
Creates tables showing error type distributions.

### 7. analyze_best_worst_clusters.py
Shows top 2 best and worst accuracy clusters for each combination.

### 8. save_best_worst_clusters.py
Saves best/worst cluster information to files.

### 9. export_best_worst_table.py
Exports best/worst cluster information to table format.

### 10. run_statistical_tests.py
Performs Fisher's Exact Test comparing clusters to baseline.

---

## Suggested Execution Order

```bash
# Step 1: Setup
export OPENAI_API_KEY="your-api-key-here"

# Step 2: Data Collection
python download_datasets.py
python generate_responses.py

# Step 3: Basic Analysis
python failure_statistics.py

# Step 4: Clustering
python cluster_per_combination.py

# Step 5: Failure Analysis
python analyze_reasoning_failures.py

# Step 6: Error Distribution Tables
python export_combined_table_percentages.py

# Step 7: Best/Worst Cluster Analysis
python analyze_best_worst_clusters.py
python save_best_worst_clusters.py
python export_best_worst_table.py

# Step 8: Statistical Testing
python run_statistical_tests.py
```

---

## Requirements

See [requirements.txt](requirements.txt) for dependencies.

```bash
pip install -r requirements.txt
```

### Environment Variables
- `OPENAI_API_KEY` - Required for generating responses

---

## Citation

If you use this pipeline in your research, please cite appropriately.

---

## License

MIT License
