# Experiments
- Exp1: run scripts/train/run_grpo_qwen2.5_full_curriculumlearning.sh and change models in different size
- Exp2: run scripts/train/run_grpo_qwen2.5_data_constraint.sh with different train files in data/math_curriculum_sampled
- Exp3: run scripts/train/run_grpo_qwen2.5_G_num.sh with different `n_resp_per_prompt`
## Parameters that Need to be Customized
- TRAINING_GPUS
- AUTHOR_NAME
- WANDB_DIR
- WANDB_MODE (If your cluster has internet access, set it to "online"; otherwise, set it to "offline")
- WANDB_API_KEY
## If Your GPU Memory is Insufficient, Reduce the Values of the Following Parameters
- actor_seq_multiplier
- rollout_seq_multiplier
- gpu_memory_utilization (0.5-0.7 is recommended)
## If You Find There is Still a Lot of Free GPU Memory, Increase the Values of the Following Parameters
- actor_seq_multiplier
- rollout_seq_multiplier
## Notes
- Please use nvitop to monitor GPU utilization during the training process.
