# State Transformer

State Transformer (STR) is a general model for both motion prediction and motion planning  across multiple large-scale real-world datasets. We reformulate the motion prediction and motion planning problem by arranging all elements into a sequence modeling task. Our experiment results reveal that large trajectory models (LTMs), such as STR, adhere to the scaling laws by presenting outstanding adaptability and learning efficiency when trained using larger Transformer backbones. Qualitative analysis illustrates that LTMs are capable of generating plausible predictions in scenarios that diverge significantly from the training dataset’s distribution. LTMs can also learn to make complex reasonings for long-term planning, extending beyond the horizon of 8 seconds, without explicit loss designs or costly high-level annotations. Checkout our paper for more details.

# Abstract

Motion prediction and planning are vital tasks in autonomous driving, and recent efforts have shifted to machine learning-based approaches. 
    The challenges include understanding diverse road topologies, reasoning traffic dynamics over a long time horizon, interpreting heterogeneous behaviors, and generating policies in a large continuous state space. 
    Inspired by the success of large language models in addressing similar complexities through model scaling, we introduce a scalable trajectory model called State Transformer (STR). STR reformulates the motion prediction and motion planning problems by arranging observations, states, and actions into one unified sequence modeling task.
    With a simple model design, STR consistently outperforms baseline approaches in both problems.
    Remarkably, experimental results reveal that large trajectory models (LTMs), such as STR, adhere to the scaling laws by presenting outstanding adaptability and learning efficiency.
    Qualitative results further demonstrate that LTMs are capable of making plausible predictions in scenarios that diverge significantly from the training data distribution. LTMs also learn to make complex reasonings for long-term planning, without explicit loss designs or costly high-level annotations.


# Method

* Causal Transformer for sequence modeling instead of Encoder-Decoder models
* Easy modify the backbone to scale or replace with other LLMs
* Versatile for adding classification loss or rule-based post-processions

![STR Sequence Modeling](pipeline.gif)

# Usage

## Prepare Environment

Run `pip install -r requirements.txt` to install all dependencies. There are some additional dependencies for NuScenes and Waymo datasets. Please refer to the official websites for more details.
Also you need to install opencv for NuPlan raster encoder. Please refer to the official website for more details.

### Install Transformer4Planning

run `pip install -e .` from the root directory of Transformer4Planning.

### Install NuPlan-Devkit
(tested with v1.2)
run `pip install -e .` from the root directory of NuPlan-Devkit.
Then install these packages:

    pip install aioboto3
    pip install retry
    pip install aiofiles
    pip install bokeh==2.4.1

### Install WOMD-Devkit

run `pip install waymo-open-dataset-tf-2-11-0==1.6.0` to install WOMD-Devkit following [here](https://github.com/waymo-research/waymo-open-dataset).

## Dataset

We process the dataset into the Hugging Face Dataset format. Click [here](http://180.167.251.46:880/NuPlanSTR/nuplan-v1.1_STR.zip) to download the NuPlan dataset.
Unzip the file and pass the path to the '--saved_dataset_folder' argument to use it.
This dataset contains the training, validation, and the test set.

Waymo dataset will be updated soon on the server to download.

If you need to customize your dataset, you would need to process the official dataset by our scripts.

- For NuPlan dataset, please refer to the `generation.py` and read the following 'NuPlan Dataset Pipeline' section for more details. 
- For Waymo dataset, please refer to the `waymo_generation.py` and read the following 'Waymo Dataset Pipeline' section for more details. 

### NuPlan Dataset Pipeline

Usage:

1. process NuPlan .db files to .pkl files (to agent dictionaries)
2. generate filtered scenario index and cache in .arrow files
3. generate map dictionary to pickles

We recommend you to organize your downloaded dataset in the following way. If you need to process from a customized directory, check `generation.py` for more derivations.

```
    {DATASET_ROOT}
    ├── maps
    │   ├── us-ma-boston
    │   ├── us-pa-pittsburgh-hazelwood
    │   ├── us-nv-las-vegas-strip
    │   ├── sg-one-north
    │   ├── nuplan-maps-v1.0.json
    ├── nuplan-v1.1
    │   │── train_singapore
    │   │   ├── *.db
    │   │── train_boston
    │   │   ├── *.db
    │   │── train_pittsburgh
    │   │   ├── *.db
    │   │── train_las_vegas_1
    │   │   ├── *.db
    │   │── train_las_vegas_2
    │   │   ├── *.db
    │   │── train_las_vegas_3
    │   │   ├── *.db
    │   │── train_las_vegas_4
    │   │   ├── *.db
    │   │── train_las_vegas_5
    │   │   ├── *.db
    │   │── train_las_vegas_6
    │   │   ├── *.db
    │   │── test
    │   │   ├── *.db
    │   │── val
    │   │   ├── *.db
```

Step 1: Process .db to .pkl by running (data_path is the folder name like 'train_singleapore'):
```
    python generation.py  --num_proc 40 --sample_interval 100  
    --dataset_name boston_index_demo  --starting_file_num 0  
    --ending_file_num 10000  --cache_folder {PATH_TO_CACHE_FOLDER}
    --data_path {PATH_TO_DATASET_FOLDER}  --only_data_dic
```

Step 2: Generate scenarios to .arrow datasets
```
    python generation.py  --num_proc 40 --sample_interval 100  
    --dataset_name boston_index_interval100  --starting_file_num 0  
    --ending_file_num 10000  --cache_folder {PATH_TO_CACHE_FOLDER}  
    --data_path {PATH_TO_DATASET_FOLDER}  --only_index  
```

Step 3: Generate Map files to .pickle files

```
    python generation.py  --num_proc 40 --sample_interval 1 --dataset_name {DATASET_NAME}  
    --starting_file_num 0  --ending_file_num 10000  
    --cache_folder {PATH_TO_CACHE_FOLDER}  
    --data_path {PATH_TO_DATASET_FOLDER} --save_map
```

```
    python generation.py  --num_proc 40 --sample_interval 1  
    --dataset_name vegas2_datadic_float32 --starting_file_num 0  --ending_file_num 10000  
    --cache_folder {PATH_TO_CACHE_FOLDER} --data_path {PATH_TO_DATASET_FOLDER} --save_map
```

You only need to process Vegas's map once for all Vegas subsets.


Why process .db files to .pkl files? Lower disk usage (lower precision) and faster loading (without initiate NuPlan DataWrapper)


```
root-
 |--train
      |--us-ma-boston
         --*.pkl
      |--us-pa-pittsburgh-hazelwood
    ...
 |--test
     |--us-ma-pittsburgh
        --*.pkl
     ...
 |--map
    --us-ma-boston.pkl
    --us-pa-pittsburgh-hazelwood.pkl
    --us-nv-las-vegas-strip.pkl
    --sg-one-north
    ...
 |--index (can be organized dynamically)
    |--train
        |--train-index_boston
            --*.arrow
    |--test
        |--test-index_pittsburgh
            --*.arrow    
```


## To train and evaluate during training:

`--model_name` can choose from ["gpt-large","gpt-medium","gpt-small","gpt-mini"] and with prefix of `scratch-` or `pretrain-` to determine wether load pretrained weights from existed checkpoints, whose attributes is `--model_pretrain_name_or_path`

The common training settings are shown below.

```
python -m torch.distributed.run \
--nproc_per_node=8 runner.py \
--do_train --do_eval\
--model_name scratch-gpt-mini --model_pretrain_name_or_path None \
--saved_dataset_folder  {PATH_TO_DATASET_FOLDER} \
--output_dir {PATH_TO_OUTPUT_FOLDER} \
--logging_dir {PATH_TO_LOG_FOLDER} \
--run_name {RUN_NAME} \
--num_train_epochs 100 \
--per_device_train_batch_size 16 --warmup_steps 50 \
--weight_decay 0.01 --logging_steps 2 --save_strategy steps \
--dataloader_num_workers 10 \
--save_total_limit 2 \
--dataloader_drop_last True \
--task nuplan \
--remove_unused_columns False \
--evaluation_strategy steps \
--eval_steps 1000 --loss_fn mse \
--per_device_eval_batch_size 8 \
--max_eval_samples 10000 --predict_yaw True \
--use_full_training_set False --past_sample_interval 2 \
--x_random_walk 0.1 --y_random_walk 0.1 --augment_current_pose_rate 0.1
```

To choose different encoder, please set the attribute `--encoder_type`. The choices are [`raster`, `vector`]. With different `--task` setting, the encoder can be initilized with different classes.

To choose different decoder, please set the attribute `--decoder_type`. The choices are [`mlp`, `diffusion`].

### Train only diffusion decoder, without backbone

Training the diffusion decoder together with the backbone is not recommended due to extremely slow convergence. We recommand you to train the backbone and ecoders with the MLP decoder, and train the diffusion decoder after convergence.

After training with an MLP decoder, you need to generate the dataset for training the diffusion decoder using `generate_diffusion_feature.py`: this is done using the same command for eval except that you need to set `--generate_diffusion_dataset_for_key_points_decoder` to True and `--diffusion_feature_save_dir` to the dir to save the pth files for training diffusion decoders.
An example:
```
python -m torch.distributed.run \
--nproc_per_node=8 generate_diffusion_feature.py \
--model_name pretrain-gpt-mini \
--model_pretrain_name_or_path {PATH_TO_PRETRAINED_CHECKPOINT} \
--saved_dataset_folder  {PATH_TO_DATASET_FOLDER} \
--output_dir {PATH_TO_OUTPUT_FOLDER} \
--logging_dir {PATH_TO_LOG_FOLDER} \
--run_name {RUN_NAME} \
--dataloader_num_workers 10 \
--dataloader_drop_last True \
--dataset_scale 1 \
--task nuplan \
--future_sample_interval 2 \
--past_sample_interval 5 \
--evaluation_strategy steps \
--overwrite_output_dir --loss_fn mse --max_eval_samples 10000\
--next_token_scorer True \
--pred_key_points_only False \
--diffusion_feature_save_dir {PATH_TO_DIFFUSION_FEATURE_SAVE_DIR} \
```
After running these, you are supposed to get three folders under `diffusion_feature_save_dir` for `train`, `val` and `test` diffusion features respectively.

```
diffusion_feature_save_dir
 |--train
    --future_key_points_[0-9]*.pth
    --future_key_points_hidden_state_[0-9]*.pth
 |--val
    --future_key_points_[0-9]*.pth
    --future_key_points_hidden_state_[0-9]*.pth
 |--test
    --future_key_points_[0-9]*.pth
    --future_key_points_hidden_state_[0-9]*.pth
```

After saving the pth files, you need to run `convert_diffusion_dataset.py` to convert them into arrow dataset which is consistent with the training format we are using here.
```
python3 convert_diffusion_dataset.py \
    --save_dir {PATH_TO_SAVE_DIR} \
    --data_dir {PATH_TO_DIFFUSION_FEATURE_SAVE_DIR} \
    --num_proc 10 \
    --dataset_name train \
    --map_dir {PATH_TO_MAP_DIR} \
    --saved_datase_folder {PATH_TO_DATASET_FOLDER} \
    --use_centerline False \
    --split train \
```

After which you are expected to see such structures in save_dir:
```
save_dir
    |--generator
        |--*
    |--train
        *.arrow
        dataset_info.json
        state.json
    *.lock
```

Fianlly, please set `--task` to `train_diffusion_decoder`. In this case, the model is initilized without a transformer backbone(Reduce the infence time to build backbone feature). 
In the meanwhile, please change the `--saved_dataset_folder` to the folder which stores 'backbone features dataset', obtained previously by `convert_diffusion_dataset.py`. In this case it should be consistent with `save_dir` in the previous step.
```
python3 -m torch.distributed.run \
--nproc_per_node=8 runner.py \
--model_name pretrain-gpt-small \
--model_pretrain_name_or_path {PATH_TO_PRETRAINED_CHECKPOINT} \
--saved_dataset_folder {PATH_TO_DATASET_FOLDER} \
--output_dir {PATH_TO_OUTPUT_FOLDER} \
--logging_dir {PATH_TO_LOG_FOLDER} \
--run_name {RUN_NAME} \
--num_train_epochs 10 \
--weight_decay 0.00001 --learning_rate 0.0001 --logging_steps 2 --save_strategy steps \
--dataloader_num_workers 10 \
--per_device_train_batch_size 2 \
--save_total_limit 2  --predict_yaw True \
--dataloader_drop_last True \
--task train_diffusion_decoder \
--remove_unused_columns False \
--do_train \
--loss_fn mse \
--per_device_eval_batch_size 2 \
--past_sample_interval 2 \
--trajectory_loss_rescale 0.00001 \
```

### Evaluate the performance of diffusion decoder

After training the diffusion keypoint decoder separately, you may use such command to eval its performance:
```
export CUDA_VISIBLE_DEVICES=0; \
nohup python3 -m torch.distributed.run \
--nproc_per_node=1 runner.py \
--model_name pretrain-gpt-small-gen1by1 \
--model_pretrain_name_or_path {PATH_TO_PRETRAINED_CHECKPOINT} \
--saved_dataset_folder  {PATH_TO_DATASET_FOLDER} \
--output_dir {PATH_TO_OUTPUT_FOLDER} \
--logging_dir {PATH_TO_LOG_FOLDER} \
--run_name {RUN_NAME} \
--num_train_epochs 1 \
--weight_decay 0.00 --learning_rate 0.00 --logging_steps 2 --save_strategy steps \
--dataloader_num_workers 10 \
--per_device_train_batch_size 2 \
--save_total_limit 2  --predict_yaw True \
--dataloader_drop_last True \
--task nuplan \
--remove_unused_columns False \
--do_eval \
--evaluation_strategy epoch \
--loss_fn mse \
--per_device_eval_batch_size 2 \
--past_sample_interval 2 \
--kp_decoder_type diffusion --key_points_diffusion_decoder_feat_dim 256 --diffusion_condition_sequence_lenth 1 \
--key_points_diffusion_decoder_load_from {KEY_POINT_DIFFUSION_CHECKPOINT_PATH}
```

## To predict:

```
export CUDA_VISIBLE_DEVICES=3; \
python -m torch.distributed.run \
--nproc_per_node=1 \
--master_port 12345 \
runner.py --model_name pretrain-gpt \
--model_pretrain_name_or_path data/example/training_results/checkpoint-xxxxx \
--saved_dataset_folder /localdata_ssd/nuplan/nsm_autoregressive_rapid \
--output_dir data/example/prediction_results/checkpoint-xxxxx \
--per_device_eval_batch_size 32 \
--dataloader_num_workers 16 \
--predict_trajectory True \
--do_predict \
--saved_valid_dataset_folder /localdata_ssd/nuplan/boston_test_byscenario/
--max_predict_samples 1000 \
--dataloader_drop_last True \
--with_traffic_light True \
--remove_unused_columns True \
--next_token_scorer True \
--trajectory_loss_rescale 1e-3 \
--pred_key_points_only False \
--specified_key_points True \
--forward_specified_key_points False \
```

## Waymo Dataset Pipeline

Generate WOMD training data dictionary and index:

`
python waymo_generation.py --mode train --save_dict --agent_type 1 2 3 
`

Here you got the original pickle data files. By default, we require the distribution of data and index files to be as follow.
```
WOMD_data_save_dir
 |--index
    --train
    --val
    --test
 |--origin
    --train
    --val
    --test
```
To accelerate the data loading part, we suggest to split the large pickle files into small ones per scenario by:

`
python split_scenario_waymo.py --delete_origin True
`


## To evaluate on NuBoard:

WARNING: This is still under development.

We use the NuPlan Garage environment to evaluate the NuPlan model on NuBoard. The NuPlan Garage environment is a wrapper around the NuPlan-Devkit environment. It is used to run the NuPlan model in the NuBoard environment. The NuPlan Garage environment is located in the `nuplan_garage` folder.

### Install Transformer4Planning

run `pip install -e .` from the root directory of Transformer4Planning.

### Install NuPlan-Devkit
(tested with v1.2)
run `pip install -e .` from the root directory of NuPlan-Devkit.
Then install these packages:

    pip install aioboto3
    pip install retry
    pip install aiofiles
    pip install bokeh==2.4.1


### Register the planner
Create a new yaml file for Hydra at: `script/config/simulation/planner/control_tf_planner.yaml` with:


    control_tf_planner:
        _target_: transformer4planning.submission.planner.ControlTFPlanner
        horizon_seconds: 10.0
        sampling_time: 0.1
        acceleration: [5.0, 5.0]  # x (longitudinal), y (lateral)
        thread_safe: true

Create a new yaml file for Hydra at: `script/config/simulation/planner/rule_based_planner.yaml` with:
    rule_based_planner:
        _target_: transformer4planning.submission.rule_based_planner.RuleBasedPlanner
        horizon_seconds: 10.0
        sampling_time: 0.1
        acceleration: [5.0, 5.0]  # x (longitudinal), y (lateral)
        thread_safe: true

### Run simulation without yaml changing


1. Install Transformer4Planning and NuPlan-Devkit
2. (Optional) Copy the script folder from NuPlan's Official Repo to update
3. Modify dataset path in the `run_simulation.py` and run it to evaluate the model with the tran-xl planner


    python script/run_simulation.py 'planner=control_tf_planner' 
    'scenario_filter.limit_total_scenarios=2' 'scenario_filter.num_scenarios_per_type=1' 
    'job_name=open_loop_boxes' 'scenario_builder=nuplan' 
    'ego_controller=perfect_tracking_controller' 'observation=box_observation'
    '+model_pretrain_name_or_path=/public/MARS/datasets/nuPlanCache/checkpoint/nonauto-regressive/xl-silu-fde1.1' 


### Or Modify yaml files and py scripts 
#### Modify the following variables in yaml files
nuplan/planning/script/config/common/default_experiment.yaml

`job_name: open_loop_boxes` or
`job_name: closed_loop_nonreactive_agents` or 
`job_name: closed_loop_reactive_agents`

nuplan/planning/script/config/common/default_common.yaml

`scenario_builder: nuplan` correspond to the yaml filename in scripts/config/common/scenario_builder

`scenario_filter: all_scenarios` correspond to the yaml filenmae in scripts/config/common/scnario_filter 

nuplan/planning/script/config/common/worker/ray_distributed.yaml

`threads_per_node: 6`

nuplan/planning/script/config/common/worker/single_machine_thread_pool.yaml

`max_workers: 6`

nuplan/planning/script/config/simulation/default_simulation.yaml

    observation: box_observation
    ego_controller: perfect_tracking_controller
    planner: control_tf_planner

nuplan/planning/script/config/common/default_common.yaml

debug mode `- worker: sequential`
    
multi-threading `- worker: ray_distributed`

nuplan/planning/script/config/common/scenario_filter/all_scenarios.yaml

`num_scenarios_per_type: 1`

`limit_total_scenarios: 5`
   
`log_names: []` to filter the db files to run simulation

#### Add the following at the beginning of run_xxx.py

    os.environ['USE_PYGEOS'] = '0'
    import geopandas
    os.environ['HYDRA_FULL_ERROR'] = '1'
    os.environ['NUPLAN_DATA_ROOT'] = ''
    os.environ['NUPLAN_MAPS_ROOT'] = ''
    os.environ['NUPLAN_DB_FILES'] = ''
    os.environ['NUPLAN_MAP_VERSION'] = ''

#### Filter log files or maps
set `scenario_builder` `scenario_filter` to correspond scenario_builder yaml file and scenario_filter yaml file in `script/config/common/default_common.yaml`. scenario_builder path is `script/config/common/scenario_builder` and scenario_filter path is `script/config/common/scenario_filter`. For example, choose builder yaml as nuplan, and filter yaml as all_scenarios, if you want to filter specify logs or maps, please add `[log_name1, ..., log_namen]` to log_names.

#### Run the following command from NuPlan-Devkit

`python nuplan/planning/script/run_simulation.py`

to set configs: 
planner choice: `planner=control_tf_planner` Optional `[control_tf_planner, rule_based_planner]`
chanllenge choice: `+simulation=closed_loop_reactive_agents` Optional `[closed_loop_reactive_agents, open_loop_boxes, closed_loop_nonreactive_agents]`

### Launch nuboard for visualization

``python script/run_nuboard.py simulation_path='[{SIMULATION_RESULTS_PATH}]' 'scenario_builder=nuplan' 'port_number=5005'``

or

`python nuplan/planning/script/run_nuboard.py simulation_path='[{SIMULATION_RESULTS_PATH}]'`

### Statics Simulation Scores
`python script/run_metric_scores.py --file_path ... --save_dir ... --exp_name ... --simulation_type ...`

