# Reproducing Libra experiments

Step-by-step instructions to exactly reproduce the main results in Libra.

## Hardware Configurations
| Specification | Value |
|---------------|-------|
| **CPU** | Intel Xeon Platinum 8580|
| **CPU Sockets** | 2 (128 cores) |
| **GPU** | NVIDIA H200 |
| **GPU per Node** | 8 |
| **System Memory** | 32×64 GB DDR5-5600 (total 2,048GB)
| **Memory per GPU** | 141GB HBM3e |
| **Interconnect** | NVSwitch (900GB/s bandwidth) |

## System Configurations
- **OS**: Ubuntu 24.04.4 LTS
- **Python**: 3.10
- **CUDA Version**: 12.8

## Prerequisites
- 3 custom SGLang builds (sglang_libra, sglang_lina, sglang_libra_internal each)
    - See README.md for environment setup
- 3 patch files (sglang_libra.diff, sglang_lina.diff, sglang_libra_internal.diff)
- Huggingface Datasets
    - [BookCorpus](https://huggingface.co/datasets/rojagtap/bookcorpus)
    - [Codeforces](https://huggingface.co/datasets/open-r1/codeforces)
    - [DeepSeek-Prover-V1](https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1)
    - [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) (Config: sample-10BT)
    - [GSM8K](https://huggingface.co/datasets/openai/gsm8k)
    - [HellaSwag](https://huggingface.co/datasets/Rowan/hellaswag)
    - [HumanEvalPlus](https://huggingface.co/datasets/evalplus/humanevalplus)
    - [LMSYS_Chat-1M](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)
- Models
    - [Qwen3-235B-A22B](https://huggingface.co/Qwen/Qwen3-235B-A22B)
    - [GLM-4.5](https://huggingface.co/zai-org/GLM-4.5)

## Reproducing Figure 7

### Building Token to Expert Mapping Table (TEMT)
TEMT is a mapping table that records, for each dataset's build split, the top-k expert IDs assigned by each scheme for every token. This can be generated by running scripts below in sglang_libra_internal.

```bash
# Generate TEMT for Qwen3MoE
bash generate_temt_qwen3.sh

# Generate TEMT for GLM-4.5
bash generate_temt_glm45.sh
```

### SGLang
Throughput of SGLang can be evaluated by running scripts below in sglang_libra. Check is_ori = True before running the script.
```bash
# in sglang_libra
# Qwen3MoE
bash qwen3_throughput_sglang.sh

# GLM-4.5
bash glm45_throughput_sglang.sh
```

### EPLB
EPLB requires offline profiling to find hot experts and cold GPUs, and then map hot experts to cold GPUs. Expert placement, which is the result of offline profiling, can be generated by running scripts below in sglang_libra. Check is_ori = True before running the script.
```bash
# in sglang_libra
cd eplb

# Find hot experts and cold GPUs
bash find_hot_experts_and_cold_gpus_qwen3.sh
bash find_hot_experts_and_cold_gpus_glm45.sh

# Create expert placement
bash run_all_cold_gpu_qwen3.sh
bash run_all_cold_gpu_glm45.sh

cd ..
```

After generating expert placement, throughput of EPLB can be evaluated by running script below in sglang_libra.

```bash
# Qwen3MoE
bash qwen3_throughput_eplb.sh

# GLM-4.5
bash glm45_throughput_eplb.sh
```

### Lina
Lina requires expert-selection-path table, which is a lookup table used to predict the next expert to map a token to, given the historical path of previously selected experts. Expert-selection-path table can be generated by running script below in sglang_libra_internal. Check is_ori = False before running the script.

```bash
cd lina

# Generate expert-selection-path table
bash run_make_expert_path_table_qwen3.sh
bash run_make_expert_path_table_glm45.sh

cd ..
```

After generating expert-selection-path table, throughput of Lina can be evaluated by running script below in sglang_lina.

```bash
# Qwen3MoE
bash qwen3_throughput_lina.sh

# GLM-4.5
bash glm45_throughput_lina.sh
```

### Libra
By running script below in sglang_libra, we can evaluate throughput of Libra. Check is_ori = False before running the script.

```bash
# Qwen3MoE
bash qwen3_throughput_libra.sh

# GLM-4.5
bash glm45_throughput_libra.sh
```

## Reproducing Figure 8

### Throughput
Throughput of SGLang, Lina and Libra is already evaluated in Figure 7.

### Imbalance Ratio
Using TEMT, we can evaluate imbalance ratio, which is defined as the load of the most burdened GPU divided by the average load across all GPUs. Imbalance ratio of each scheme can be evaluated by running script below in sglang_libra_internal. Hot experts, cold GPUs data, and Lina's expert-selection-path table, which was used in throughput evaluation, also required to calculate imbalance ratio

```bash
# For EPLB
cd eplb
cp path/to/sglang_libra/eplb/hot_experts . # replace this into your sglang_libra path

bash evaluate_fluctuation_eplb_2048.sh
bash evaluate_fluctuation_eplb_4096.sh

cd ..

# For Lina
cd lina

bash evaluate_fluctuation_lina_2048.sh
bash evaluate_fluctuation_lina_4096.sh

cd ..

# For SGLang and Libra

cd prefetch_rebalance

bash evaluate_fluctuation_libra_2048.sh
bash evaluate_fluctuation_libra_4096.sh

cd ..
```

### Combine into Throughput/Imbalance Ratio Fluctuation Data

Throughput fluctuation in shuffled dataset can be combined by running a python file below in sglang_libra. Moving lina's throughput result file into sglang_libra is required.

```bash
cp path/to/sglang_lina/latency_throughput/* latency_throughput
python get_throughput_fluctuation.py
```

Imbalance ratio fluctuation in shuffled dataset can be combined by running a python file below in sglang_libra_internal.

```bash
python get_imbalance_ratio_fluctuation.py
```