# Exploiting Topology of Attention Maps for Protein Per-Residue Classification

This code is provided supplementary to our paper "Exploiting Topology of Attention Maps for Protein Per-Residue Classification" submitted to the Neurips conference

## Installation

To install environment:

```bash
conda env create --file=res_tda.yaml
```

To install ph_simple lib you need to run:
```bash
cd ph_simple
./build.sh
```
or
```bash
cd ph_simple
python setup.py build --build-lib=./lib
```


## Generating attention maps

### Binding prediction task

For the train set of the Binding prediction 10 tasks  you have to specify testset_name (DNA, RNA, ZN, etc.) in a config_token.yaml file:

```python
task: calculate_sparse_attns
testset_name: DNA
subset: train
model_name: facebook/esm2_t33_650M_UR50D
```
For the test set of the Binding prediction task Dataset2 you have to specify in a config_token.yaml file:

```python
task: calculate_sparse_attns
testset_name: Dataset2
subset: test
model_name: facebook/esm2_t33_650M_UR50D
```


For the train set of the Binding prediction task Dataset1/Dataset2 you have to specify in a config_token.yaml file:

```python
task: calculate_sparse_attns
testset_name: Dataset1
subset: train
model_name: facebook/esm2_t33_650M_UR50D
```
For the test set of the Binding prediction task Dataset1 you have to specify in a config_token.yaml file:

```python
task: calculate_sparse_attns
testset_name: Dataset1
subset: test
model_name: facebook/esm2_t33_650M_UR50D
```


### Conservation prediction task

For the train set of the Conservation prediction task you have to specify in a config_token.yaml file:

```python
task: calculate_sparse_attns
testset_name: ConSuf10k
subset: train
model_name: facebook/esm2_t33_650M_UR50D
```
For the validation set of the Conservation prediction task you have to specify in a config_token.yaml file:

```python
task: calculate_sparse_attns
testset_name: ConSuf10k
subset: val
model_name: facebook/esm2_t33_650M_UR50D
```
For the test set of the Conservation prediction task you have to specify in a config_token.yaml file:

```python
task: calculate_sparse_attns
testset_name: ConSuf10k
subset: test
model_name: facebook/esm2_t33_650M_UR50D
```

## Generating features

For the Method RES_MST all heads all layers and test set $testset_name$ you have to specify in a config_token.yaml file (num_layers: 33, num_heads: 20 configuration is provided for the  facebook/esm2_t33_650M_UR50D model): 

```python
task: calculate_topo_features_from_sparce_matrix
testset_name: $testset_name$
subset: test
model_name: facebook/esm2_t33_650M_UR50D
attn: attns 
sum: False
num_layers: 33 
num_heads: 20 
method: 3
graph_laplacian: False 
with_vert: true 
```

For the Method RES_MST avg heads and test set $testset_name$ you have to specify in a config_token.yaml file (num_layers: 33, num_heads: 20 configuration is provided for the  facebook/esm2_t33_650M_UR50D model): 

```python
task: calculate_topo_features_from_sparce_matrix
testset_name: $testset_name$
subset: test
model_name: facebook/esm2_t33_650M_UR50D
attn: attns 
sum: True 
num_layers: 33 
num_heads: 20 
method: 3
graph_laplacian: False 
with_vert: true 
```


For the Method RES_LT and test set $testset$ you have to specify in a config_token.yaml file (num_layers: 33, num_heads: 20 configuration is provided for the  facebook/esm2_t33_650M_UR50D model): 

```python
task: calculate_topo_features_from_sparce_matrix
testset_name: $testset$
subset: test
model_name: facebook/esm2_t33_650M_UR50D
attn: attns #attns_with_cls
threshold: 0.9 #0.99 0.999
num_layers: 33 
num_heads: 20 
method: 1
graph_laplacian: False 
with_vert: true #false
```


## Classifiers

Py-boost scripts can be found at pyboost_scripts folder

## Biological interpretaion

For the biological interpretaion (Section 3.2 of the paper) please see biological_interpretaion/README.md 