# Self-Reflective Generation at Test Time (SRGen)

SRGen is a framework for implementing Test-time Training (TNOT) functionality on Transformer-based language models. It provides a universal decorator that can enhance any CausalLM model with self-reflective generation capabilities during inference.

## Features

- **Universal TNOT Decorator**: Apply TNOT functionality to any Transformers CausalLM model without separate modeling files
- **Multi-GPU Parallel Processing**: Support for parallel evaluation across multiple GPUs
- **Multiple Dataset Support**: Built-in evaluators for AIME, MATH, and other mathematical reasoning datasets
- **Flexible Configuration**: Extensive hyperparameter tuning options for entropy control, sampling, and adaptive strategies

## Installation

### Prerequisites

- Python 3.10+
- CUDA-compatible GPU (recommended)
- PyTorch with CUDA support

### Install Dependencies

1. **Install requirements:**
```bash
pip install -r requirements.txt
```

## Quick Start

### Basic Usage

1. **Run AIME evaluation with parallel processing:**
```bash
bash scripts/parallel_aime_distill_qwen.sh
```

2. **Custom model evaluation:**
```bash
python -m srgen.aime_evaluator \
    --model_path your_model_path \
    --parallel \
    --max_parallel_gpus 4 \
    --average 5 \
    --split train \
    --version 2024 \
    --times 3
```

### Using TNOT Decorator in Your Code

```python
from srgen.tnot_decorator import enable_tnot
from transformers import AutoModelForCausalLM

# Apply TNOT decorator to any model class
TNOTModelClass = enable_tnot(AutoModelForCausalLM)
model = TNOTModelClass.from_pretrained("model_name")

# Or apply to specific model classes
@enable_tnot
class MyCustomModel(LlamaForCausalLM):
    pass
```

## Configuration

### Key Parameters

- `--model_path`: Path to the model (Hugging Face model ID or local path)
- `--parallel`: Enable parallel processing across multiple GPUs
- `--max_parallel_gpus`: Maximum number of GPUs to use
- `--average`: Number of runs to average results
- `--lr`: Learning rate for test-time training
- `--entropy_threshold`: Threshold for entropy-based control
- `--entropy_weight`: Weight for entropy regularization
- `--use_entropy_control`: Enable entropy-based control
- `--adaptive_entropy`: Enable adaptive entropy control
- `--max_retries`: Maximum number of retry attempts
- `--temperature`: Sampling temperature
- `--max_new_tokens`: Maximum number of new tokens to generate

### Dataset Options

- `--split`: Dataset split (train/test/validation)
- `--version`: Dataset version (2024/2025 for AIME)
- `--eval_samples`: Number of samples to evaluate (optional)

## Available Evaluators

- **AIME Evaluator** (`srgen.aime_evaluator`): American Invitational Mathematics Examination
- **MATH Evaluator** (`srgen.math_evaluator`): Mathematical reasoning problems
- **Base Evaluator** (`srgen.base_evaluator`): Base class for custom evaluators

## Examples

### Single GPU Evaluation
```bash
python -m srgen.aime_evaluator \
    --model_path deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
    --split train \
    --version 2024 \
    --times 1 \
    --lr 0.01
```

### Multi-GPU Parallel Evaluation
```bash
python -m srgen.aime_evaluator \
    --model_path deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
    --parallel \
    --max_parallel_gpus 4 \
    --average 5 \
    --split train \
    --version 2024 \
    --times 3 \
    --lr 0.01 \
    --entropy_threshold 3.0 \
    --entropy_weight 0.05 \
    --use_entropy_control
```

## Troubleshooting

### Common Issues

1. **CUDA Out of Memory**: Reduce `--max_parallel_gpus` or use smaller batch sizes
2. **Model Loading Errors**: Ensure the model path is correct and accessible
3. **Dataset Download Issues**: Check internet connection or use HF mirror

### Environment Variables

```bash
export HF_HOME=~/.cache/huggingface  # Hugging Face cache directory
export HF_ENDPOINT=https://hf-mirror.com  # Optional: use mirror
```

## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

## License

This project is licensed under the MIT License - see the LICENSE file for details.