# Recipe
The examples under `recipes/` are representative extensions to verl for specific end-to-end RL training recipes.
The help the community reproduce experiments, verl team provides a snapshot of the codebase when each recipe is initially PR'ed to verl main. You can find them via [github branches](XXXX)

# Awesome work using verl

- [Logic-RL](XXXX): a reproduction of DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset. XXXX
- [Seed-Coder](XXXX): RL training of Seed-Coder boosts performance on competitive programming XXXX
- [all-hands/openhands-lm-32b-v0.1](XXXX): A strong, open coding agent model, trained with [multi-turn fine-tuning](XXXX)
- [s3](XXXX) **Efficient Yet Effective** Search Agent Training via RL XXXX
- [Rec-R1](XXXX): Bridging Generative Large Language Models and Recommendation Systems via Reinforcement Learning
- [Explore RL Data Scaling](XXXX): Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
- [FIRE](XXXX): Flaming-hot initiation with regular execution sampling for large language models
- [DQO](XXXX): Enhancing multi-Step reasoning abilities of language models through direct Q-function optimization
- [ProRL](XXXX): Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
- [cognition-engineering](XXXX): Test time scaling drives cognition engineering. XXXX
- [Trust Region Preference Approximation](XXXX): A simple and stable **reinforcement learning algorithm** for LLM reasoning. XXXX
- [AdaRFT](XXXX): Efficient Reinforcement Finetuning via **Adaptive Curriculum Learning** XXXX
- [critic-rl](XXXX): LLM critics for code generation XXXX
- [self-rewarding-reasoning-LLM](XXXX): self-rewarding and correction with **generative reward models** XXXX
- [DeepEnlighten](XXXX): Reproduce R1 with **social reasoning** tasks and analyze key findings XXXX
- [MetaSpatial](XXXX): Reinforcing **3D Spatial Reasoning** in **VLMs** for the **Metaverse** XXXX
- [PURE](XXXX): **Credit assignment** is the key to successful reinforcement fine-tuning using **process reward model** XXXX
- [cognitive-behaviors](XXXX): Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs XXXX
- [deepscaler](XXXX): iterative context scaling with GRPO XXXX
- [DAPO](XXXX): the fully open source SOTA RL algorithm that beats DeepSeek-R1-zero-32B XXXX
- [NoisyRollout](XXXX): Reinforcing Visual Reasoning with Data Augmentation XXXX
