SuperFoldAE: Enhancing Protein Fold Classification with Autoencoders

Shraddha Patre, Riya Kanani, Fardina Fathmiul Alam

Published: 01 Jan 2025, Last Modified: 28 Jan 2026CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: Protein fold classification is essential for understanding protein function and its role in biological processes. Proteins are crucial for cellular structure, organization, and function, with their tertiary structure directly influencing their roles within cells. This drives interest in computational research, particularly machine learning approaches for classifying protein folds from sequence data. While numerous methods exist for classifying protein folds from sequence data using machine learning, few approaches leverage tertiary structure data with deep learning techniques. In this paper, we introduce SuperFoldAE, a 2D-convolutional autoencoder neural network specifically designed for protein fold classification. Our model employs advanced convolutional techniques and a supervised loss on the autoencoder’s latent layer to effectively learn patterns in protein structures. Autoencoders excel at compressing high-dimensional data into a lower-dimensional latent space, capturing intricate patterns and critical features of protein structures. We investigate various configurations within a supervised framework and enhance generalization by integrating unsupervised regularizers via reconstruction loss. Our approach aims to improve classification accuracy by learning representations that highlight subtle structural variations often overlooked in sequence data alone. Using the Structural Classification of Proteins (SCOP) 1.75 dataset, we seek to enhance the reliability of protein fold classification. Our findings contribute valuable insights into the functional implications of protein structures and pave the way for further research into representation-based deep learning for protein classification.

External IDs:doi:10.1007/978-3-031-85435-4_1