Keywords: Single molecule force spectroscopy, protein unfolding, application in single molecule identification, physics augmentation, physics-based Monte Carlo simulation
Abstract: Deciphering the pathways of protein folding and unfolding under tension is essential for deepening our understanding of fundamental biological mechanisms. Such insights offer the potential to develop treatments for a range of incurable and fatal debilitating conditions, including muscular disorders like Duchenne Muscular Dystrophy and neurodegenerative diseases such as Parkinson’s disease. Single molecule force spectroscopy (SMFS) is a powerful technique for investigating forces when domains in proteins fold and unfold. Currently, manual visual inspection remains the primary method for classifying force curves resulting from single proteins; a time-consuming task demanding significant expertise. In this work, we develop a classification strategy to detect measurements arising from single molecules by augmenting deep learning models with the physics of the protein being investigated. We develop a novel physics-based Monte Carlo engine to generate simulated datasets comprising of force curves that originate from a single molecule, multiple molecules, or failed experiments. We show that pre-training deep learning models with the simulated dataset enables high throughput classification of SMFS experimental data with average accuracies of $75.3 \pm 5.3$\% and ROC-AUC of $0.87 \pm 0.05$. Our physics augmentation strategy does not need expensive expert adjudication of the experimental data where models trained using our strategy show up to 25.9\% higher ROC-AUC over the models trained solely on the limited SMFS experimental data. Furthermore, we show that incorporating a small subset of experimental data ($\sim 100$ examples) through transfer learning improves accuracy by 6.8\% and ROC-AUC by 0.06. We have validated our results on three new SMFS experimental datasets. To facilitate further research in this area, we make our datasets available and provide a Python-based toolbox (\url{https://anonymous.4open.science/r/AFM_ML-2B8C}).
Supplementary Material:  zip
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10608
Loading