# Code for 'Probing RLVR Training Instability through the Lens of Objective-Level Hacking'

-----------

## Dependencies

All experiments are based on the Docker image `verl0.5-vllm0.10.0-mcore0.13.0`. To build and run:
```bash
docker build -f verl/docker/verl0.5-cu126-torch2.7-fa2.7.4/Dockerfile.app.vllm.mcore0.13 \
  -t verl0.5-vllm0.10.0-mcore0.13.0 .
docker run --gpus all -it verl0.5-vllm0.10.0-mcore0.13.0 bash
```

-----------

## Scripts

The `recipe` files include `bash` scripts for reproducing the results and conclusions presented in the paper. All experiments are conducted by default on 4 × 8 NVIDIA A100 (80GB) GPUs. Other hardware configurations may require adjusting the parallelization strategy accordingly.

Our experimental implementation is based on the open-source `VERL` framework, and we thank the developers for making it publicly available. Our modifications primarily involve additional metric monitoring, training experience storage, and extensions to support proactive weight distortion.