# Weakly Supervised Virus Capsid Detection with Image-Level Annotations in Electron Microscopy Images

## Introduction
This codebase hosts all code of the project of "Weakly Supervised Virus Capsid Detection with Image-Level Annotations in Electron Microscopy Images".

## Abstract
Current state-of-the-art methods for object detection rely on annotated bounding boxes of large data sets for training. However, obtaining such annotations is expensive and can require up to hundreds of hours of manual labor. This poses a challenge, especially since such annotations can only be provided by experts, as they require knowledge about the scientific domain. To tackle this challenge, we propose a domain-specific weakly supervised object detection algorithm that only relies on image-level annotations, which are significantly easier to acquire. Our method  distills the knowledge of a pre-trained model, on the task of predicting the presence or absence of a virus in an image, to obtain a set of pseudo-labels that can be used to later train a state-of-the-art object detection model. To do so, we use an optimization approach with a shrinking receptive field to extract virus particles directly without specific network architectures. Through a set of extensive studies, we show how the proposed pseudo-labels are easier to obtain, and, more importantly, are able to outperform other existing weak labeling methods, and even ground truth labels, in cases where the time to obtain the annotation is limited.

![Overview of the WSCD pipeline](assets/overview.png "Overview of the WSCD pipeline")

## Installation
To get started using the repository, please follow the instructions below.
We recommend using a virtual environment to install the dependencies. To install the dependencies, run the following command:
### Pip
```
pip install -r requirements.txt
```

### Conda
```
conda env create -f environment.yml
```

## Data
The data used in this project is available at [LINK](https://drive.google.com/drive/folders/1NBxFoarSX58ahZ1bXS8CMOr5hzOF5OgS?usp=share_link). Please download the data and put it in the `Data` folder inside this repository. The data folder should have the following structure:
```
Data
├── Herpes
│   ├── Test
│   └── Train
│   └── Val
└── LargeScaleTEM
    ├── Adenovirus
    ├── Papilloma
    ├── Norovirus
    └── Rotavirus
        ├── test
        ├── train
        ├── validation
        └── annotation_time.txt
```

Alternatively, you can modify the paths in `Variables.py` to point to your data folder.

## Deterministic Mode

Our code is implemented in deterministic mode, please set the following environment variable before running the code:
```
export CUBLAS_WORKSPACE_CONFIG=:16:8
```

## Docker
If you are planning to run the experiments in a docker container, we recommend using the `1.12.0-cuda11.3-cudnn8-runtime` base image from [here](https://hub.docker.com/layers/pytorch/pytorch/1.12.0-cuda11.3-cudnn8-runtime/images/sha256-1ef1f61b13738de8086ae7e1ce57c89f154e075dae0b165f7590b9405efeb6fe?context=explore).


## WandB Logging
This codebase uses [Weights & Biases](https://wandb.ai/) for logging, therefore we require you to login to your WandB account before running the code. To do so, please run the following command:
```
export WAND_API_KEY=<YOUR_API_KEY>
```


## Reproducing Results For Herpes Dataset
To reproduce our results, for 100% of binary annotation time, run the following commands successively:
```
# Bounding Box
python Main_BoundingBox.py --project WSCD --seeds 42 123 7353 --annotation_time 38027 

# Location 
python Main_Location.py --project WSCD --seeds 42 123 7353 --annotation_time 38027 

# Ours Opt and OD
python Main_Binary.py --project WSCD_ours --seeds 42 123 7353 --annotation_time -1 --data_split train test
```

The parameters allow you to specify the following:
- `--project`: The name of the project in WandB
- `--seeds`: The seeds to use for the experiments
- `--annotation_time`: The time to use for the experiments. If set to `-1`, the code will use all labels.
- `--data_split`: List of the data splits to use for the experiments. Can be `train`, `test`, `val`. 

Upon acceptance, we will additionally release model weights for the main experiments to improve reproducablility.


## Citation
If you use this codebase in your research, please cite the following paper:
```
PLACEHOLDER
```
