### Dataset Generation

```bash
OUT=alpaca_data
NBEGIN=0
NEND=1000000000
TARGET=70b
DRAFT=7b
DATASET=tatsu-lab/alpaca  # from ['tatsu-lab/alpaca' 'openai_humaneval' 'gsm8k_test']

mkdir -p ${OUT}
python3 gen_dataset.py --dataset_name ${DATASET} --model_name ${TARGET} --mode vllm --do_sample --output_dir ${OUT} --n_begin ${NBEGIN} --n_end ${NEND}
python3 gen_assistant.py --model_name ${DRAFT} --do_sample --input_file ${OUT}/dataset${NBEGIN}to${NEND}_vllm${TARGET}.json
python3 gen_log_p.py --model_name ${DRAFT} --input_file ${OUT}/dataset${NBEGIN}to${NEND}_vllm${TARGET}_${DRAFT}stochastic.json
python3 gen_log_p.py --model_name ${TARGET} --input_file ${OUT}/dataset${NBEGIN}to${NEND}_vllm${TARGET}_${DRAFT}stochastic_${DRAFT}logP.json
python3 gen_acceptance.py --target_name ${TARGET} --draft_name ${DRAFT} --input_file ${OUT}/dataset${NBEGIN}to${NEND}_vllm${TARGET}_${DRAFT}stochastic_${DRAFT}logP_${TARGET}logP.json
```

```
prompt: str, the prompt (wrapped with [INST] and [/INST])
prefix: list[int], tokenized prompt.
continuation: str, the response generated by the target model (llama-2-chat 70B)
tokens: list[int], tokenized continuation.
stochastic_7b: list[int], next tokens generated from the draft model conditioned on target model's generation
p_acc: list[float], the acceptance probabilities of the current tokens.
```

### Sample Training Code:

```bash
layer=3
weight=6

WANDB_PROJECT=rl-decode python3 train.py --data_path  alpaca_data/train40k.json --output_dir exp-weight${weight}-layer${layer} \
    --model_name_or_path meta-llama/Llama-2-7b-chat-hf --bf16 True --per_device_train_batch_size 4 \
    --num_train_epochs 3 --gradient_accumulation_steps 8 \
    --logging_steps 5 --eval_data_path alpaca_data/dev10k.json --evaluation_strategy epoch --per_device_eval_batch_size 4 \
    --weight_mismatch ${weight} --save_strategy no --warmup_ratio 0.03 --lr_scheduler_type cosine --resnet_num_layers ${layer}

```

### Sample Benchmarking Script:

```bash
layer=3 # sweep over 0, 1, 2, 3, 4
weight=6 # sweep over 1, 3, 6, 12
thres=0.3 # sweep over 0.1, 0.3, 0.5, 0.7, 0.9

ckpt=exp-weight${weight}-layer${layer}

llm=70b
ssm=7b
data=alpaca_data/test2k.json
SAVEPATH=test-results-main/weight${weight}-layer${layer}-thres${thres}-bound2_20/

python3 main.py --model_name ${llm} --assistant_name ${ssm}  --num_assistant_tokens_schedule ada --data_path ${data} --assist_acc_head_dir $ckpt --do_sample --random_seed 42 --save_path ${SAVEPATH} --stop_threshold ${thres} --bound 2 20
```
