# A General Protocol to Probe Large Vision Models for 3D Physical Understanding

This is the code and data of the paper "A General Protocol to Probe Large Vision Models for 3D Physical Understanding".



## Installation (Python 3.8.8 + Numpy 1.20.1 + PyTorch 1.13.1)

```
pip install pycocotools
pip install Pillow
pip install scipy
pip install -U scikit-learn
pip install ipdb
pip install scikit-image
```


## Extract Stable Diffusion Feature
Clone the github https://github.com/Tsingularity/dift/tree/main, and put the files under `dift/` of this github. Use `dift/dift_sd.py` in this github to replace `src/models/dift_sd.py`. Then fill in the paths and

```
python dift/extract_dift_depth.py
```


## Extract Stable Diffusion Feature

Clone the following githubs and download the corresponding checkpoints as in the paper:

OpenCLIP: https://github.com/mlfoundations/open_clip

DINOv1: https://github.com/facebookresearch/dino

DINOv2: https://github.com/facebookresearch/dinov2

VQGAN: https://github.com/CompVis/taming-transformers

Take the outputs of the ViT/Transformer layers as features form different layers.


## Download Original Datasets

For Same Plane and Perpendicular Plane: https://github.com/NVlabs/planercnn

For Material: https://github.com/apple/ml-dms-dataset

For Shadow: https://github.com/stevewongv/InstanceShadowDetection

For Occlusion: https://github.com/Championchess/A-Tri-Layer-Plugin-to-Improve-Occluded-Detection/tree/master and https://cocodataset.org/#home

For Support Relation and Depth: https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html


## Our Datasets

Here we take the 'depth' task as an example.
In the folder 'depth_img_name_list' we provide the train/val/test image name lists in the respective .json files.
In the folder 'depth_region_pair' we provide the regions and pairs for train/val/test in the respective .json files.


## Train and Test Linear SVM

Take depth task as an example,
```
python SVM/depth_train_test_svm.py
```

# License

CC-BY 4.0 for all the assets used and our new assets