# All Used Datasets for *SynAdapt*

## Training datasets

### 1. Base Training dataset derived from DeepMath-103K:
**Path: ./Train/DeepMath.json.**  
**Total Count: 9660 samples.**
```
Key Fields:
> Question;
> Difficulty;
> Topic: Reflect the topic of the question;
> Answer_Content: The correct answer but without CoT;
> Split_COT_Content: The list of discrete CoT segments;
...
```

### 2. Binary dataset for difficulty classifier training:
**Path: ./Train/Binary_diff_data.json.**  
**Total Count: 10214 samples.**
```
Key Fields:
> win_question: The harder question;
> win_difficulty: The difficulty level of the harder question;
> win_raw_id: Indicate the id of harder question in the DeepMath dataset;
> lose_question: The simpler question;
> lose_difficulty: The difficulty level of the simpler question;
> lose_raw_id: Indicate the id of simpler question in the DeepMath dataset;
```


## Evaluation datasets for Trade-off

### 1. AIME25:
**Path: ./eval_trade_off/aime25/test.jsonl.**  
**Total Count: 30 samples.**

### 2. AIME24:
**Path: ./eval_trade_off/aime24/test.jsonl.**  
**Total Count: 30 samples.**

### 3. AMC23:
**Path: ./eval_trade_off/aime24/test.jsonl.**  
**Total Count: 40 samples.**

### 4. MATH500:
**Path: ./eval_trade_off/aime24/test.jsonl.**  
**Total Count: 500 samples.**

### 5. GSM8K:
**Path: ./eval_trade_off/aime24/test.jsonl.**  
**Total Count: 1319 samples.**


## Evaluation datasets for Difficulty Classifier

### 1. MATH500:
**Path: ./eval_difficulty_classifier/MATH500/test.json.**  
**Total Count: 500 samples.**

### 2. MixD:
**Path: ./eval_difficulty_classifier/MixD/test.json.**  
**Total Count: 363 samples.**
