# README

The folder experiment-slurm-scripts contains the sbatch scripts and code to run the RL experiments for the paper 

``Asymmetric Prompt Weighting for Reinforcement Learning with Verifiable Rewards''

and this folder is a fork of SkyRL, modified for the experiments in the paper.

# Installation

## SLURM Cluster Setup and Installation
module load release/24.04  GCCcore/12.3.0
module load CUDA/12.1.1
module load NCCL/2.18.3-CUDA-12.1.1


cd $WORK
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

export WORK=/path/to/your/workspace/
export PATH="$WORK/miniconda3/condabin:$PATH"
source $WORK/miniconda3/etc/profile.d/conda.sh

conda create -y -p "$WORK/miniconda3/envs/skyrl-train" python=3.12
conda activate skyrl-train

pip install --upgrade pip

## First, let's clean up any conflicting packages
pip uninstall vllm torch torchvision flash-attn -y

## Install PyTorch 2.7.1 with CUDA 12.1 (compatible with your CUDA 12.6)
pip install torch==2.7.1 torchvision --index-url https://download.pytorch.org/whl/cu121

## Install the exact versions from pyproject.toml
pip install ray==2.48.0
pip install transformers>=4.51.0
pip install accelerate 
pip install hydra-core==1.3.2
pip install omegaconf
pip install datasets>=3.6.0
pip install loguru tqdm tensorboard func_timeout
pip install torchdata peft debugpy==1.8.0
pip install hf_transfer wandb tensordict jaxtyping polars

## Install flash-attention (compatible with CUDA 12.1/12.6 and PyTorch 2.7.1)
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.0.post2/flash_attn-2.8.0.post2+cu12torch2.7cxx11abiFALSE-cp312-cp312-linux_x86_64.whl


## Install VLLM (exact version from pyproject.toml)
pip install vllm==0.10.1.1

## Install SkyRL in editable mode
cd $HOME/SkyRL/skyrl-train
pip install -e .
