# Language Models That Think, Chat Better

This github repository contains the code for the ICLR 2026 submission "Language Models That Think, Chat Better" by <redacted>.

## Setup
Please find the necessary dependencies inside `requirements.txt`. We recommend installing [PyTorch](https://pytorch.org/) first when creating your environment, followed by the other dependencies.
For flash attention, you may have to use the `--no_build_isolation` flag when installing it (refer to `https://github.com/Dao-AILab/flash-attention` for more installation help).

## Benchmarking
Our benchmarking code is self-contained and designed to be easy to use and extend to other benchmarks.
It is contained in the `eval` directory, where we inlcude a README file with more instructions.

## Training
For SFT and DPO, we rely on `trl`. 
For PPO and GRPO, however, we used `verl` for its high efficiency and scalability.
We provide the SFT/DPO code inside `training/sft_dpo` and the PPO/GRPO code inside `training/ppo_grpo`.
There are other utilities inside `training/sft_dpo/utils` for the preparation of SFT and DPO datasets.
For more details, please refer to
- The README inside `training` for details on how to launch SFT, DPO, PPO, and GRPO.
- The README inside `training/utils` for more details on (1) how we prepare the "zero" versions of models for seamless training, (2) how we prompt models and API endpoints to create SFT and DPO datasets, and (3) formatting of the datasets for SFT/DPO.

## Data release

We release the training data and model checkpoints at <redacted>.

> **NOTE for ICLR submission:** Some of our training code is based on the verl training framework, and therefore contains several copyright notices from the developers of verl - these should not be misconstrued as de-anonymization as they do not represent the authors of this paper.