# AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation

This repository provides the PyTorch implementation for the "AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation".



## 1. AVSBench dataset

Please refer to [AVSBench](https://github.com/OpenNLPLab/AVSBench) to download the datasets. Remember to modify the path in config files.

------



## 2. Pretrained backbones

The pretrained backbones can be downloaded from [here](https://drive.google.com/drive/folders/1386rcFHJ1QEQQMF6bV1rXJTzy8v26RTV?usp=sharing) and placed to the directory `pretrained_backbones`.

**Notice:** please update the path of data and pretrained backbone in `./configs` accordingly.

------



## 3. Environments

Please download environments before everything.

```sh
pip install -r requirment.txt
# build MSDeformAttention
cd ./model/ops
sh make.sh
```


---



## 4. Train & test

- Train AVESFormer Model
```sh
accelerate launch train_s4.py ./configs/AVESFormer_s4.py # or train_ms3, train_avss
```

- Test AVESFormer Model
```sh
accelerate launch test_s4.py ./configs/AVESFormer_s4.py # or test_ms3, test_avss
```
---



## 5. License

This project is released under the Apache 2.0 license as found in the [LICENSE](./LICENSE) file.
