Enhanced RNA Sequence Representation through Sequence Masking and Subsequence Consistency Optimization

Published: 2023, Last Modified: 15 Nov 2024BIBM 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In the burgeoning field of RNA research, accurate and efficient RNA sequence representation remains a pivotal challenge, exacerbated by the complexity and diversity of RNA sequences. Addressing the critical need for enhanced sequence representation and the issues of sequence context and structural alignment, this study introduces a novel, comprehensive approach. The proposed model seamlessly integrates sequence masking and subsequence consistency optimization, offering a robust solution to the intricate problem of RNA sequence representation. Utilizing the filtered RNAStralign dataset, encompassing 20,923 sequences, the model's performance is rigorously evaluated employing a Support Vector Machine (SVM) for subsequent RNA family classification tasks. Despite the inherent imbalance in RNA family sequence distribution, the model demonstrates exemplary performance, achieving high classification accuracy and AUPRC values across diverse RNA sequence groups. This balanced and unbiased assessment, ensured by the use of AUPRC as an evaluation metric, highlights the model's practical utility for comprehensive RNA sequence analysis and classification. In essence, this research presents a method for enhanced RNA sequence representation and laying a robust foundation for future advancements in the nuanced field of RNA sequence analysis.
Loading