# TempLLM: Temperature and Top-p Learning for Large Language Models

TempLLM is a robust framework designed for training large language models (LLMs) with dynamic temperature and top-p sampling capabilities. This framework facilitates the training of dedicated "temperature" and "top-p" heads for LLM architectures, currently offering support for the Qwen and Llama models. A key feature of TempLLM is its ability to automatically output the learned temperature and top-p values during the inference process, providing valuable insights into the model's generation strategy.

## 🌟 Key Features

*   **Independent Head Training**: Supports the separate training of a temperature head (`temp_head`) and a top-p head (`top_p_head`).
*   🎯 **Customizable Training**: Easily configure your training runs to focus on either the temperature or top-p head by specifying `train_temp` or `train_top_p` in the training script.
*   🚀 **High-Performance Inference**: Leverages the VLLM inference serving library for efficient and fast generation, while seamlessly recording the temperature and top-p values.
*   🔧 **Broad Model Support**: Out-of-the-box implementation for the `qwen2.5` and `LlaMA-3.1` model architectures.

## 📂 Project Structure

```
TempLLM/
├── model/                    # 模型定义
│   ├── templlm_qwen2_5.py   # TempLLM Qwen2.5实现
│   ├── templlm_llama.py     # TempLLM Llama实现
│   └── __init__.py
├── trainer/                  # 训练器
│   ├── trl_Temp.py         # 主训练器
│   ├── dft.py              # DFT训练器 (弃用)
│   └── TempLLM.py          # 基础训练器 (弃用)
├── script/                  # 脚本
│   ├── trl_train.sh        # 训练脚本
│   ├── test_generation_tp4.sh # 测试脚本
│   └── ...
├── trl_train.py            # 训练入口
└── README.md
```


## 🚀 Getting Started

### 1. Model Training

Initiate the model training process by executing the provided training script:

```bash
bash ./script/trl_train.sh
```

#### preparing data


TRL prompt completion format can be directly trained

Example:
{'prompt': "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nCalculate the limit of the series...", 
"completion": "<think>\nOkay, so I need to find the... "
}

#### training script

`script/trl_train.sh`

```bash
# for top-p head training
--train_temp false \
--train_top_p true

# for temp head training
--train_temp true \
--train_top_p false

```

### 2. inference

```bash
bash ./script/test_generation_tp4.sh
```
