G4mer: An RNA language model for transcriptome-wide identification of G-quadruplexes and disease variants from population-scale genetic data

Published: 03 Oct 2024, Last Modified: 21 May 2025BioRxivEveryoneCC BY-NC-ND 4.0
Abstract: RNA G-quadruplexes (rG4s) are key regulatory elements in gene expression, yet the effects of genetic variants on rG4 formation remain underexplored. Here, we introduce G4mer, an RNA language model that predicts rG4, classifies rG4 subtypes, and evaluates the effects of genetic variants across the transcriptome. G4mer significantly improves accuracy over existing methods and uncovers subtype-specific differences in mutational sensitivity and evolutionary constraint, highlighting sequence length and flanking motifs as important rG4 features. Applying G4mer to 5' untranslated region (UTR) variations, we identify variants in breast cancer-associated genes that alter rG4 formation and validate their impact on structure and gene expression. These results demonstrate the potential of integrating computational models with experimental approaches to study rG4 function, especially in diseases where non-coding variants are often overlooked. To support broader applications, G4mer is available as both a web tool and a downloadable model.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview