# SmolTraces Evaluation Datasets

This directory contains the evaluation datasets used for testing SmolTraces models.

## Datasets

| Dataset | Domain | Description | Size | Source |
|---------|--------|-------------|------|--------|
| MATH500 | math | Competition math problems from MATH dataset with step-by-step solutions | 500 | HuggingFaceH4/MATH-500 |
| AIME2024 | math | Problems from American Invitational Mathematics Examination 2024 | 30 | Maxwell-Jia/AIME_2024 |
| GPQA | science | Challenging scientific questions across Biology, Chemistry, and more | 198 | hendrydong/gpqa_diamond |

## Download

These datasets can be downloaded using the provided download script:

```bash
python -m data_generation.download_eval_datasets
```

## Usage

Each dataset has been normalized to contain 'question' and 'answer' fields for consistency when used for evaluation.

## Evaluation

These datasets are used to evaluate SmolTraces models on challenging reasoning tasks across mathematics, science, and coding domains. They provide a diverse set of problems that require strong reasoning capabilities.
