# SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving

Annotating 3D LiDAR point clouds for perception tasks including 3D object detection and LiDAR semantic segmentation is notoriously time-and-energy-consuming. To alleviate the burden from labeling, it is promising to perform large-scale pre-training and fine-tune the pre-trained backbone on different downstream datasets as well as tasks. In this paper, we propose SPOT, namely Scalable Pre-training via Occupancy prediction for learning Transferable 3D representations, and demonstrate its effectiveness on various public datasets with different downstream tasks under the label-efficiency setting. Our contributions are threefold: (1) Occupancy prediction is shown to be promising for learning general representations, which is demonstrated by extensive experiments on plenty of datasets and tasks. (2) SPOT uses beam re-sampling technique for point cloud augmentation and applies class-balancing strategies to overcome the domain gap brought by various LiDAR sensors and annotation strategies in different datasets. (3) Scalable pre-training is observed, that is, the downstream performance across all the experiments gets better with more pre-training data. We believe that our findings can facilitate understanding of LiDAR point clouds and pave the way for future exploration in LiDAR pre-training. Codes and models will be released.


* Our pre-training code is based on[3DTrans](https://github.com/PJLab-ADG/3DTrans).


* Our pre-training 3D point cloud task is based on [Waymo Dataset](https://waymo.com/open/download/) and [Occ3D Occupancy dataset](https://github.com/Tsinghua-MARS-Lab/Occ3D).


## Installation

### Requirements
All the codes are tested in the following environment:
* Linux (tested on Ubuntu 16.04)
* Python 3.6+
* PyTorch 1.7.0, PyTorch 1.8.0, PyTorch 1.8.1
* CUDA 11.1
* [`spconv v2.x`](https://github.com/traveller59/spconv)
* gcc version >= 5.4.0


### Install 3DTrans
NOTE: Please re-install `3DTrans` by running `python setup.py develop` even if you have already installed previous version.

a. Enter this repository.
```shell
cd SPOT_code
```

b. Install the dependent libraries as follows:

* Install the python dependent libraries.
  ```shell
    pip install -r requirements.txt 
  ```

* Install the gcc library, we use the gcc-5.4 version

* Install the SparseConv library, we use the implementation from [`[spconv]`](https://github.com/traveller59/spconv). 
    * It is recommended that you should install the latest `spconv v2.x` with pip, see the official documents of [spconv](https://github.com/traveller59/spconv).
    * Also, you should choice **the right version of spconv**, according to **your CUDA version**. For example, for CUDA 11.1, pip install spconv-cu111
  
c. Install this `pcdet` library and its dependent libraries by running the following command:
```shell
python setup.py develop
```


## Getting Started for SPOT

Please download and preprocess the point cloud datasets according to the [dataset guidance](GETTING_STARTED.md)

### 3D Pre-training using SPOT
* We pre-train the 3D and 2D backbones on Waymo pre-training dataset, where we suggest that ${NUM_GPUs} is set to 8 using NVIDIA A-100 80G. 
```shell script
cd tools
sh scripts/dist_train_occ.sh ${NUM_GPUs} \
--cfg_file ./cfgs/pretrain/occ_pretrain.yaml
```

|              Method                 | training time |   Ckpt |
|---------------------------------------------|------------------------:|------------------------:|
[SPOT](./tools/cfgs/pretrain/occ_pretrain.yaml) | ~15 hours (8 NVIDIA-A100 80G)| [Download for 5% Sequence-level Pre-training](https://drive.google.com/drive/folders/1tq5bBBR0AEQLbGYBiVNYng1vHoJ4cCcd?usp=drive_link) |
[SPOT](./tools/cfgs/pretrain/occ_pretrain.yaml) | ~160 hours (16 NVIDIA-A100 80G)| [Download for 100% Sequence-level Pre-training](https://drive.google.com/file/d/1dC6FTVZ3erv779MaaAWun06oszNtbSW3/view?usp=drive_link) |


## Fine-tuning with pre-training checkpoints
* We can fine-tune on a specific 3D dataset using the provided SPOT pre-training checkpoints.
> Note that you need to set the `--pretrained_model ${PRETRAINED_MODEL}` using the checkpoint obtained in the Pre-training phase.
```shell script
cd tools
sh scripts/dist_train.sh ${NUM_GPUs} \
--cfg_file ./cfgs/kitti_models/pv_rcnn_20_ft.yaml \
--pretrained_model ${PRETRAINED_MODEL} 
```
