## Quick Start


### 1. Environment Setup
Please follow the standard setup procedures for Decision Transformer (DT) to prepare the datasets and dependencies. 
Ensure that the dataset paths are correctly configured in `experiment.py`.

### 2. Running Experiments
After environment setup, you can execute the training script with the following examples:
``` bash
ENV=halfcheetah
Dataset=medium
python experiment.py --seed 3 \
    --env $ENV --dataset $Dataset    \
    --eta2 5 --grad_norm 15 \
    --exp_name qtc --save_path $SAVE_PATH     \
    --max_iters 500 --num_steps_per_iter 1000 --lr_decay --K 20 --early_epoch 200\
    --early_stop  --use_discount  --infer_no_q  --alg alignment-sequence \
    --alignment_function relu   --pretrain_q_path $Pretrain_Q_Path  --target_rtg 10  --num_eval_episodes 50

ENV=walker2d
Dataset=medium-expert
python experiment.py --seed 3 \
    --env $ENV --dataset $Dataset  \
    --eta2 0.3 --grad_norm 0.5 \
    --exp_name qtc --save_path $SAVE_PATH     \
    --max_iters 500 --num_steps_per_iter 1000 --lr_decay --K 20 --early_epoch 200 \
    --early_stop  --use_discount  --infer_no_q  --alg alignment-sequence \
    --alignment_function relu   --pretrain_q_path $Pretrain_Q_Path --target_rtg 10  --num_eval_episodes 50

```
- save_path: Indicates the directory where the trained models will be saved.

- pretrain_q_path: Indicates the path to the pretrained Q-function. 
For Gym environments, we use a simple TD-based Double Q setup for pretraining, which approximates the dataset-average behavior.
A pretrained checkpoint is provided (e.g., `pretrain_Q/saved_q_halfcheetah-medium-v2/Q_bc.pt`).
To train a new Q-function from scratch, run:
```bash
python pretrain_q.py --env halfcheetah-medium-v2 

```

### Note on Time Cost

Most of the execution time is spent in the evaluation phase, which involves environment rollouts. 
To improve throughput, we use vectorized environments (`vec_env`) to enable parallel evaluation. 
In practice, evaluation speed benefits from allocating sufficient CPU resources, as each parallel environment is typically CPU-bound. 
For best performance, evaluation throughput generally improves when the number of available CPU cores is comparable to the number of parallel evaluation environments.



