 Provides an example output of our evaluation results on four datasets (excluding AIME), using the `Qwen2.5-Math-1.5B-Instruct` model.
