
## Adaptive Test-Time Compute Allocation via Training-Free Difficulty Proxies
This repository contains scripts for running DIPA experiments on mathematical reasoning datasets, including generating responses, comparing different proxies, and comparing different allocation methods across models.

## Prerequisites

- Python 3.12+
- CUDA Version: 12.4
- Install packages:
  ```bash
  conda env create -f environment.yml
  ```

## File Structure

- generate_samples.py - Generate model responses for datasets
- generate_samples.sh - Shell script to run sample generation
- run_adaptive_allocation_mp_new.py - Multi-GPU proxies calculation for all responses
- run_adaptive_allocation_mathQwen.sh - Shell script for running proxies calculation
- compare_different_proxies.py - Compare different difficulty proxy metrics
- compare_allocation_strategies.py - Compare allocation strategies (uniform, DIPA, oracle)
## Usage

### 1. Generate Model Samples

First, generate model responses for your dataset:

```bash
# Edit generate_samples.sh to set your parameters
./generate_samples.sh
```

Or run directly:
```bash
python generate_samples.py <exp_name> <dataset_name> <model_name>
```

**Parameters:**
- `exp_name`: Output directory name
- `dataset_name`: Either "math500" or "gsm8k"
- `model_name`: HuggingFace model path (e.g., "Qwen/Qwen2.5-1.5B-Instruct")

### 2. Calculate Difficulty Proxies

Run the experiments to calculate gradients and other proxies:

```bash
# Edit run_adaptive_allocation_mathQwen.sh to set your parameters
./run_adaptive_allocation_mathQwen.sh
```

**Parameters:**
- `exp_name`: Output directory name
- `dataset_name`: Either "math500" or "gsm8k"
- `model_name`: HuggingFace model path
- `sample_path`: Path to generated samples JSON file
- `batch_size`: Batch size for processing (e.g., 10)

### 3. Compare Different Proxy Metrics

Compare various difficulty proxy metrics (VoG, GradNorm, Entropy, Length, Consistency):

```bash
python compare_different_proxies.py
```

**Note:** Update the file paths in the script to point to your generated samples and allocation results.

### 4. Compare Allocation Strategies

Compare different allocation strategies (uniform, adaptive, oracle, easy-to-hard, etc.):

```bash
python compare_allocation_strategies.py
```

**Note:** Update the file paths in the script to point to your data files.

## Configuration

### Dataset Support
- **MATH500**: Mathematical reasoning problems from the MATH dataset
- **GSM8K**: Grade school math word problems

### Models Tested
- Qwen/Qwen2.5-1.5B-Instruct
- Qwen/Qwen2.5-Math-1.5B-Instruct

### Metrics Available
- **VoG**: Variance of Gradients
- **GradNorm**: Gradient Norm
- **Entropy**: Cross-entropy loss
- **Length**: Generation Length
- **CS**: Consensus Score (Generation Consistency)


## Example Workflow

```bash
# 1. Generate samples for MATH500 with Qwen2.5-1.5B
./generate_samples.sh

# 2. Run allocation experiment
./run_adaptive_allocation_mathQwen.sh

# 3. Compare proxy metrics
python compare_different_proxies.py

# 4. Compare allocation strategies
python compare_allocation_strategies.py
```