<h1 align="center">
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
</h1>

<div align="center">

<em>Author and affiliation information removed for double-blind review.</em>

</div>

<div align="center">

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
![Python](https://img.shields.io/badge/python-3.9%2B-blue)

</div>

## Does Reinforcement Learning Truly Extend Reasoning?
- **The "Capability Refiner" View**: Some work characterizes RL as a refiner of existing skills, amplifying abilities already learned during pre-training. RL shows limited ability to improve pass@k compared with base model.

- **The "New Competency" View**: Some studies present evidence of substantial reasoning gains beyond pre-training, suggesting RL can induce genuinely new compositional skills.


This discrepancy arises from **a lack of control**. With models trained on **opaque internet corpora**, we cannot know what reasoning primitives the base model has already internalized. **Our work's goal is to resolve this conflict through controlled analysis.**


## 🔍 Overview

Our paper builds a **fully controlled experimental framework** to analyze how **pre-training**, **mid-training**, and **RL-based post-training** jointly shape the reasoning abilities of language models. Using synthetic math-style reasoning tasks with **explicit atomic operations** and **process-verifiable reasoning traces**, we study:

- **Extrapolative generalization** to more complex compositions (deeper dependency graphs).
- **Contextual generalization** across diverse surface forms and linguistic contexts.
- How **RL interacts** with prior knowledge, and when it yields **genuine capability gains** beyond pre-training.

## Code

- **Environment (reviewer-friendly)**  
  - Python \(\geq\) 3.9  
  - Optional but recommended: create an isolated environment:
    ```bash
    conda create -n interplay-lm python=3.9 -y
    conda activate interplay-lm
    ```
  - Install the core library in editable mode (this pulls most Python deps):
    ```bash
    pip install -e ./verl
    ```

- **Repository layout (high level)**  
  - `verl/`: lightweight RL + SFT training / evaluation framework used in the experiments.  
  - `scripts/`: experiment utilities (data generation, tokenizer building, evaluation, plotting).  
  - `gsm_infinite/`: standalone long-context benchmark code (vendor directory; not modified here).  
  - `data/`: metadata describing the reasoning datasets used in our experiments.

- **Minimal code run example (sanity check for reviewers)**  
  This snippet only uses the Python standard library and inspects the bundled dataset metadata:
  ```bash
  cd /path/to/Interplay-LM-Reasoning-main
  python - << 'PY'
  import json, pathlib

  root = pathlib.Path(__file__).resolve().parent
  data_dir = root / "data"

  with open(data_dir / "dataset_info.json", "r", encoding="utf-8") as f:
      info = json.load(f)

  print("Available splits in dataset_info.json (truncated):")
  for i, (name, meta) in enumerate(info.items()):
      print(f"  - {name}: keys={list(meta.keys())[:5]}")
      if i >= 4:
          break

  with open(data_dir / "PRESET.json", "r", encoding="utf-8") as f:
      preset = json.load(f)

  print("\\nExample preset keys:", list(preset.keys())[:10])
  PY
  ```
  Running this command should complete in a few seconds and confirms that the codebase and data files are readable in your environment.

## 🗂️ Data

Each reasoning example used in our experiments includes the following conceptual fields:
- `problem`, `question`, `solution`: natural-language problem statement, query, and step-by-step solution with a final answer.
- `op`: operation count level used to control difficulty.
- `depth`: dependency depth of the reasoning graph.
- `id`, `template`, `mode`: identifiers for context/template and forward/reverse reasoning mode.



For anonymity, we do not link to any external dataset hosting pages here; all information necessary to understand the schema is contained in this repository (`data/dataset_info.json` and `data/PRESET.json`).


## 📝 License

This project is released under the MIT License.
See [LICENSE](LICENSE) for details.
