# Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

## Directory Structure

```
soft_thinking/
├── datasets/
│   ├── aime2024.json
│   └── ... (other datasets)
├── models/
│   └── download.py
├── scripts/
│   ├── baseline/
│   └── st/
├── sglang_soft_thinking_pkg/
│   └── (sglang files)
├── config.sh
├── codeeval.py
├── convert_livecodebench.py
├── humanevaleval.py
├── mbppeval.py
├── matheval.py
├── run_sglang_softthinking.py
├── run_sglang_nothinking.py
└── ... (other files)
```

## Environment Setup

To set up the virtual environment for sglang soft thinking inference, execute each line in `config.sh`.

## Reproduction Instructions

### 1. Baseline

Run the baseline script:

```
bash scripts/baseline/qwq32b.sh
```

#### Download the Model

First, download the model to the `models/` directory:

```
python ./models/download.py --model_name "Qwen/QwQ-32B"
```

#### Run Inference

Then, run the baseline inference:

```
python run_sglang_softthinking.py \
    --dataset "aime2024" \
    --model_name "./models/Qwen/QwQ-32B" \
    --max_generated_tokens 32768 \
    --temperature 0.6 \
    --top_p 0.95 \
    --top_k 30 \
    --min_p 0.0 \
    --mem_fraction_static 0.8 \
    --start_idx 0 \
    --end_idx 10000 \
    --num_gpus 8 \
    --num_samples 16  \
    --use_llm_judge \
    --api_base "<replace it>" \
    --deployment_name "<replace it>" \
    --api_version "<replace it>" \
    --api_key "<replace it>" \
    --push_results_to_hf \
    --hf_repo_id "<replace it>" \
    --hf_token "<replace it>"
```

> **Note:**  
> - If you use the LLM judge or wish to upload results to Hugging Face, remember to provide the required API information.

---

### 2. Soft Thinking

Run the Soft Thinking script:

```
bash scripts/st/qwq32b.sh
```

Or directly execute:

```
python run_sglang_softthinking.py \
    --dataset "aime2024" \
    --model_name "./models/Qwen/QwQ-32B" \
    --max_topk 15 \
    --max_generated_tokens 32768 \
    --temperature 0.6 \
    --top_p 0.95 \
    --top_k 30 \
    --min_p 0.0 \
    --after_thinking_temperature 0.6 \
    --after_thinking_top_p 0.95 \
    --after_thinking_top_k 30 \
    --after_thinking_min_p 0.0 \
    --early_stopping_entropy_threshold 0.1 \
    --early_stopping_length_threshold 256 \
    --mem_fraction_static 0.8 \
    --start_idx 0 \
    --end_idx 10000 \
    --num_gpus 8 \
    --num_samples 1 \
    --enable_soft_thinking \
    --use_llm_judge \
    --api_base "<replace it>" \
    --deployment_name "<replace it>" \
    --api_version "<replace it>" \
    --api_key "<replace it>" \
    --push_results_to_hf \
    --hf_repo_id "<replace it>" \
    --hf_token "<replace it>"
```

#### Hyperparameter Search

To achieve optimal results, tune the following hyperparameters:

- `max_topk`: {5, 10, 15, 20}
- `min_p`: {0.005, 0.01, 0.02}
- `early_stopping_entropy_threshold`: {0.01, 0.05, 0.1}
- `early_stopping_length_threshold`: {128, 256, 512, 1024}

> **Note:**  
> - Results may vary across different devices even with the same hyperparameters, due to differences in computation precision.
> - You can change the model (`model_name`) and dataset (`dataset`) to experiment with other configurations.

