# OpenDAS: Multi-modal Prompt Learning for Open-Vocabulary Segmentation

Our multi-modal prompt learning method can be found in [OpenDAS](configs/trainers/OpenDAS).

The repository also supports
[CoOp](configs/trainers/CoOp),
[CoCoOp](configs/trainers/CoCoOp),
[VPT](configs/trainers/VPT/vit_b16_c2_ep5_batch4_4.yaml),
[MaPLe](configs/trainers/MaPLe), and 
[RPO](configs/trainers/RPO),
architectures.
<hr />

## Model Architecture

![main figure](docs/main_figure.png)

> **<p align="justify"> Method:** *We adopt a simpler architecture and train prompts for different modalities separately. Additionally, we incorporate triplet loss during text prompt training for better generalization. For negative samples, we use GPT-4 to generate similar classes offline, and during training, we choose the hardest negative sample on the fly.* </p>

<hr />

## :ballot_box_with_check: Supported Methods

[comment]: <> (| Language Prompting            | MaPLe |  [link]&#40;configs/trainers/IVLP/vit_b16_c2_ep5_batch4_4ctx_language_only.yaml&#41;      |      |)

| Method                    | Paper                                         |                             Configs                             |          Training Scripts          |
|---------------------------|:----------------------------------------------|:---------------------------------------------------------------:|:----------------------------------:|
| CoOp                      | [IJCV 2022](https://arxiv.org/abs/2109.01134) |                  [link](configs/trainers/CoOp)                  |        [link](scripts/coop)        |
| CoCoOp                    | [CVPR 2022](https://arxiv.org/abs/2203.05557) |                 [link](configs/trainers/CoCoOp)                 |       [link](scripts/cocoop)       |
| VPT                       | [ECCV 2022](https://arxiv.org/abs/2203.12119) |    [link](configs/trainers/VPT/vit_b16_c2_ep5_batch4_4.yaml)    |        [link](scripts/vpt)         |
| RPO                       | [ICCV 2023](https://arxiv.org/abs/2308.14960) |    [link](configs/trainers/RPO)                                 |        [link](scripts/rpo)         |
| MaPLe                     | [CVPR 2023](https://arxiv.org/abs/2210.03117) | [link](configs/trainers/MaPLe/vit_b16_c2_ep5_batch4_2ctx.yaml)  |       [link](scripts/maple)        |
| OpenDAS                   |  - | [link](configs/trainers/OpenDAS/vit_l14_c2_ep10_batch16_8+4ctx_d24_use_both_losses.yaml)  |       [link](scripts/opendas)        |


<hr />

## Results
### OpenDAS in comparison with existing methods on ScanNet++ Offices

Results reported below show weighted-f1 for base and novel and all classes on ScanNet++ Offices.  

| Name                                                      | \# Params | W-F1        | Base F1  |  Novel F1 |
|-----------------------------------------------------------|:---------:|:----------:|:---------:|:---------:|
| [CLIP](https://arxiv.org/abs/2103.00020)                  |   0       |   11. 2    |   11.0    |   12.0    |  
| [CoCoOp](https://arxiv.org/abs/2203.05557)                |   ~77K    |   25.7    |   34.2   |   12.7      | 
| [VPT](https://arxiv.org/abs/2203.12119)                   |  ~786K    | 33.8 | 37.6 | 12.8 |
| [RPO](https://arxiv.org/abs/2308.14960)                   | ~43K | 30.6 | 40.9 | 14.9 |
| [MaPLe](https://arxiv.org/abs/2210.03117)                 | ~18935K | 36.3 | 48.1 | 18.4 |
| [OpenDAS (ours)](https://www.overleaf.com/read/tgjrmhnshhtd#b1ff60) |  ~233K | **40.2** | **51.5**  | **23.0** |

## Installation 
For installation and other package requirements, please follow the instructions detailed in [INSTALL.md](docs/INSTALL.md). 

## Data preparation
Please follow the instructions at [DATASETS.md](docs/DATASETS.md) to prepare all datasets.

## Training and Evaluation
Please refer to the [RUN.md](docs/RUN.md) for detailed instructions on training, evaluating and reproducing the results using our pre-trained models.

<hr />

## Acknowledgements

This folder is based on [CoCoOp and CoOp](https://github.com/KaiyangZhou/CoOp), [MaPLe](https://muzairkhattak.github.io/multimodal-prompt-learning/) and [RPO](https://github.com/mlvlab/RPO/tree/main) repository. We thank the authors for releasing their codes. If you use our model and code, please consider citing these works as well.

