# Selective Deferred Routing: Enabling Cost-Efficient Collaboration between Local SLMs and Remote LLMs

The codebase and data for reproducing results in the paper.

## Step 0: Environment Setup

First, install the required dependencies:

```bash
pip install -r requirements.txt
```

## Step 1: Get SLM Hidden States
For training efficiency, we run the SLM to produce hidden states offline:
```bash
python utils/get_slm_embeddings.py --model-name <slm_name> --input-files <files_storing_prompts_and_slm_responds> --output-file <path_to_store_hiddens> --last-token-only
```


- `--model-name`: SLM path, e.g. `meta-llama/Llama-3.1-8B-Instruct` or your local path to store the model.
- `--input_files`: Files storing prompts and SLM responds on a specific dataset, which all provided in `data/*`, e.g. `data/llama-8b/mmlu-test.parquet`. *NOTE: GSM8K needs two files here, e.g.* `data/llama-8b/gsm8k-train.parquet data/llama-8b/gsm8k-test.parquet`
- `--output-file` Local path to store hidden states, e.g. `llama-mmlu-hidden.pt`
- `--last-token-only` For convenience, we recommend starting with the MLP-only decision module, which only requires the SLM hidden state of the last token in each sentence. The corresponding performance is reported in Appendix E.2. *To train the transformer-based decision module, unset this argument to store the hidden states of all tokens (which requires more storage and training-time memory), or alternatively generate hidden states online during training (just skip Step 1 and see Step 2 for details).*

## Step 2: Run Training in Single-Remote Scenario
First set arguments in `sdr/run_train_{dataset_name}.sh`:
- `--model_path`: SLM path, e.g. `meta-llama/Llama-3.1-8B-Instruct` or your local path to store the model.
- `--slm_name`: e.g. `llama-8b`
- `--llm_names`: e.g. `gpt-4o`
- `--local_answer_paths`: Set the folder name to the specified SLM, e.g. `data/llama-8b/mmlu-test.parquet`.
- `--remote_answer_paths`: Set the folder name to the specified LLM, e.g. `data/gpt-4o/mmlu-test.parquet`.
- `--embed_path`: The `*.pt` file produced in Step 1, or simply remove this argument to generate hidden states online.
- `--output_dir`: Local path to store training results.
- `--head_model`: Model architecture for the decision module, one of linear/mlp/transformer. *NOTE: transformer-based decision module need all tokens' hidden from Step 1 or generated online.*

Then run the training:
```bash
chmod +x ./sdr/run_train_{dataset_name}.sh
./sdr/run_train_{dataset_name}.sh
```

## Step 3: Run Training and Evaluation in Multi-Remote Scenario
First, change the following arguments in `sdr/run_train_{dataset_name}.sh` corresponding to each remote LLMs:
- `--num_remote`: number of remote LLMs, e.g. 4
- `--llm_names`: e.g. `llama-4 deepseek-v3 gpt-4o o4-mini`
- `--remote_answer_paths`: e.g. `data/llama-4/mmlu-test.parquet data/deepseek-v3/mmlu-test.parquet data/gpt-4o/mmlu-test.parquet data/o4-mini/mmlu-test.parquet`. *NOTE: GSM8K needs two files each LLM, list all files for train split first, then list files for test split.*

Then run the training:
```bash
./sdr/run_train_{dataset_name}.sh
```

For evaluation, first compute cost-performance data:
```bash
python utils/multi_remote_evaluate.py --remote-answer-paths <remote_answer_paths> --model-names <model_names> --dataset-name <dataset_name> --score-paths <score_paths> --output-path <output_path>
```
- `--remote-answer-paths`: same to that in `sdr/run_train_{dataset_name}.sh`
- `--model_names`: concate the SLM name and LLM names, e.g. `llama-8b llama-4 deepseek-v3 gpt-4o o4-mini`
- `--dataset-name`: mmlu/squad/gsm8k
- `--score-paths`: test result file obtained from training, `<output_dir>/test_result.parquet`
- `--output-paths`: `*.json` file to store the cost performance data, e.g. `llama-multiremote-mmlu.json`

Finally, plot the cost performance curve:
```bash
python utils/plot_cost_performance.py # set the input *.json file inside the plot_cost_performance.py
```