## From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation

## Overview
This repository provides the core implementation of **RLVRR**, a reinforcement learning with verifiable reference-based rewards framework designed for open-ended generation tasks. 

Our code is based on OpenRLHF 0.8.2. You may set up the environment according to its official guidelines.

Due to the large size of the original dataset, we only release **sample data** here for reproduction purposes. The sample data is sufficient to run the provided code and verify the workflow.  

## Contents
- `scripts/` : Core runnable code files for reproducing the main experiments.  
- `data/` : Example data for demonstration and verification.  
- `README.md` : This document. 

## Notes
- The released code and data allow you to reproduce the essential pipeline and validate the training/inference process.  
- Additional experiment code (e.g., extended ablations, scaling experiments) will be made fully available on GitHub **after the paper is accepted**.  