# Identifying single molecule force spectroscopy data using deep learning with physics augmentation
In this work, we develop a classification strategy to detect measurements arising from single molecules by augmenting deep learning models with the physics of protein unfolding. We develop a novel physics-based Monte Carlo engine to generate a simulated dataset comprising of force curves that originate from a single molecule, multiple molecules, or failed experiments. Additionally, we validate our model performances on three new SMFS experimental datasets, obtained from non-specific pulling of multi-domain molecules: titin (Titin I27O), utrophin (UtrN-R3), and dystrophin (DysN-R3). This database of force curves and the associated methods are available here. 
 
## Requirements
All required Python packages are listed in [pip-requirements.text](pip-requirements.txt). 
- [tensorflow-metal](https://developer.apple.com/metal/tensorflow-plugin/) uses Mac GPUs. 
- [tslearn](https://github.com/tslearn-team/tslearn/) is a Python package for the analysis of time series. 

## Data
Please check [anonymous repository](https://anonymous.4open.science/r/AFM_ML-2B8C/README.md) for data. 

For our three molecules: Titin I27O, UtrN-R3, DysN-R3:
  - The simulation data were generated with [MCSim.py](CallScripts/MCSim.py), which are summarized as .csv files in [ML_Dataset](Data/ML_Dataset). 
  - The experimental data: [TitinI27O Data](Data/Titin_data/Exp_ibw_data), [UtrN-R3 Data](Data/UTRNR3_data/Exp_ibw_data), [Dys-NR3 Data](Data/DysNR3_Bact_data/Exp_ibw_data), which are summarized as .csv files in [ML_Dataset](Data/ML_Dataset).
    
For DDRs, please find more details in the [paper](https://www.cell.com/patterns/pdf/S2666-3899(22)00319-1.pdf). 

## Codes
- [MonteCarloSMFS.py](APIs/MonteCarloSMFS.py) contains the necessary functions to conduct Monte Carlo simulations.
- [MCSim.py](CallScripts/MCSim.py) runs Monte Carlo simulations of protein unfolding.
- [SMNN.py](APIs/SMNN.py) contains preprocessing and deep learning models.
- [AFM_ML.py](CallScripts/AFM_ML.py) trains and evaluate deep learning models.
- [utils.py](APIs/utils.py) contains some utility functions. 


## Train and Evaluation
Training and evaluation of deep learning models are implemented in [AFM_ML.py](CallScripts/AFM_ML.py). At the start of the script, users can select parameters, such as the dataset and model, to utilize.


## Pre-trained Models
Please check [anonymous repository](https://anonymous.4open.science/r/AFM_ML-2B8C/README.md) for pre-trained models. 

Pre-trained models were trained using simulation data (except DDRs), with $M=1$ simulated reference curves augmented. Please see the paper for more information about reference curves. 
- [Titin I27O](CallScripts/ML_models/Titin/saved_model)
- [UtrNR3](CallScripts/ML_models/UtrNR3/saved_model)
- [DysNR3](CallScripts/ML_models/DysNR3_bact/saved_model)
- [DDRs](CallScripts/ML_models/fewshot/saved_model)


## Results
Our physics augmentation strategy not only lessens the need for expensive annotations from experts but also outperforms the models trained from limited SMFS experimental datasets directly. Additionally, the accuracy can be further improved when incorporating a small subset of experimental data ($\sim 100$ examples) via the transfer learning technique.

![ResNet transfer learning result](images/ResNet_transfer_results.png)

## Reference

No new data were generated for this study. The data are taken from in: 
```
@misc{hua_two_2024,
	title = {Two operational modes of atomic force microscopy reveal similar mechanical properties for homologous regions of dystrophin and utrophin},
	copyright = {© 2024, Posted by Cold Spring Harbor Laboratory. This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at http://creativecommons.org/licenses/by-nc-nd/4.0/},
	url = {https://www.biorxiv.org/content/10.1101/2024.05.18.593686v1},
	doi = {10.1101/2024.05.18.593686},
	publisher = {bioRxiv},
	author = {Hua, Cailong and Slick, Rebecca A. and Vavra, Joseph and Muretta, Joseph M. and Ervasti, James M. and Salapaka, Murti V.},
	month = may,
	year = {2024},
}

```
