

# InfoBridge: Mutual Information estimation via Bridge Matching 

This repository contains the implementation for the paper InfoBridge

## Data 

**Downloading protein embeddings from UniProt.** Download [A. thaliana](https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/embeddings/UP000006548_3702/per-protein.h5), [H. sapiens](https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/embeddings/UP000005640_9606/per-protein.h5) and [E. coli](https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/embeddings/UP000000625_83333/per-protein.h5) PT5 mean-pooled embeddings using the UniProt links. Save them as `data/ProtT5_embeddings/athaliana_embeddings.h5`, `data/ProtT5_embeddings/hsapiens_embeddings.h5` and `data/ProtT5_embeddings/ecoli_embeddings.h5` respectively.

## Preprocessing

To preprocess protein embeddings data, run the `PT5_preprocessing.ipynb` notebook.

This repo was developed and tested under Python `3.9.12`. 


To install the dependencies :

```bash
$ pip install -r requirements.txt
```

### Running experiments

One can run a particular experiment by running a following line:

```bash
$ papermill InfoBridge_ProtTrans5.ipynb  -p eps 1 -p shuffle_coef *between 0 and 1*  -p batch_size 64 -p n_epochs 100 -p seed 42 -p wd_reg 0.001 -p dropout 0.2 -p lr 3e-4 -p predict_type 'vector_field' -p n_filters 256 mi_iamge_bench_bridge_log.ipynb
```

See an example at `mutinfo/source/examples/run_info_bridge.sh`.