# 🧬 OmegAMP: Conditional AMP discovery through biologically informed representations

## 🔎 Overview 

This project presents a generative model designed for the conditional generation of novel Antimicrobial Peptides (AMPs), a classifier to distinguish between AMPs and non-AMPs and a series of fine-grained classifiers that distinguish between active and inactive peptides against specific species, strains. Together, these components form a framework that helps you discover new AMPs.

![teaser](./assets/Hero-figure-AMP.png)

## 🤗 Hugging Face

Our project is available on Hugging Face.

## 📚 Run Locally
- [🔧 Installation](#installation)
- [🏋️ Training](#training)
  - [Classifier Training](#classifier-training)
  - [Generative Model Training](#generative-model-training)
- [🔮 Inference](#inference)
  - [Sequence Generation](#generative-model-sampling)
  - [Sequence Prediction](#predict-sequences)
  - [Sequence Filtering](#filter-generated-sequences)
  - [Framework Pipeline](#run-framework)
- [🖥️ Web Interface](#running-the-streamlit-app)

### 🔧 Installation

You can opt to create a conda environment (suggested).

```sh
conda create -n amp-modelling python=3.11
```

Activate the conda environment

```sh
conda activate amp-modelling
```

Install the necessary python packages.

```sh
pip install -e .
```

Download the data from figshare link https://figshare.com/s/d1e330bc8b52263269f4 

Download the models from figshare link https://figshare.com/s/bf623f79783155f9f227


### 🏋️ Training

#### Classifier Training
Train individual classifiers or all classifiers at once:

```sh
# Train a single classifier
./project/scripts/training/train_classifier.py --classifier broad-classifier [--output_model_dir PATH]

# Train all classifiers
./project/scripts/training/train_classifier.py --classifier all [--output_model_dir PATH]
```

#### Generative Model Training
To train the generative model, call `train.py` (all the necessary hyperparameter configs are in **config/**). Note: you need to have a W&B account.

```sh
./train.py
```

The model will be saved in the **wandb/latest-run/** folder, afterwards you should replace the **models/generative-model.ckpt** by your newly trained model to use it for sampling.

### 🔮 Inference

#### Sequence Generation
Generate new sequences using different conditioning strategies:

```sh
# Unconditional generation
./project/scripts/inference/generate_samples.py Unconditional [--num_samples N] [--batch_size N]

# Partial conditional generation (with property constraints)
./project/scripts/inference/generate_samples.py PartialConditional --length=1-100 --charge=1 [--hydrophobicity VALUE]

# Subset conditional generation (based on existing sequences)
./project/scripts/inference/generate_samples.py SubsetConditional --subset_sequences=path/to/fasta
```

#### Sequence Prediction
Evaluate sequences using trained classifiers:
```sh
# Predict using a single classifier
./project/scripts/inference/predict_sequences.py path/to/sequences.fasta --classifier broad-classifier [--predict_proba]

# Predict using all classifiers
./project/scripts/inference/predict_sequences.py path/to/sequences.fasta --classifier all [--predict_proba]
```

#### Sequence Filtering
Filter and rank generated sequences:
```sh
./project/scripts/inference/filter_generated_sequences.py [--path_to_fasta PATH] [--min_length N] [--max_length N] [--strain_species STRAIN]
```

#### Framework Pipeline
Run the complete generation and filtering pipeline:
```sh
# Unconditional generation
./project/scripts/inference/run_framework.py Unconditional [--num_samples N] [--batch_size N]

# Strain/species specific generation
./project/scripts/inference/run_framework.py Conditional --strain_species species-escherichiacoli [--num_samples N]
```

### 🖥️ Running the Streamlit App

To run the Streamlit app locally:

```sh
streamlit run app.py
```

## 📑 Citation

If you find our work useful, please cite the following paper:

```bibtex
```

## ©️ License

This project is licensed under [MIT License](./LICENSE). Redistribution and use should follow this license.