Keywords: Regulatory motif analysis, MicroRNA activity inference, Sequence-specific motif probabilities, Markov background models, Binomial approximation, Computational biology
TL;DR: Scalable sequence-specific motif probability correction for miRNA activity inference through binomial approximation of exact Markov background preserving regulatory signals with several fold computational speed-up.
Abstract: Regulatory microRNA activity can be inferred from mRNA data by evaluating the clustering of its target motif across expression-ranked mRNA sequences.
Sequence-specific motif probabilities (SSPs) are required to distinguish functional motif enrichment from random motif occurrence driven by sequence length and nucleotide composition; however, current exact background Markov models scale poorly with k-mer motif size.
Here, we introduce a binomial approximation parameterized by a single-site motif probability and sequence length for scalable SSP computation, which is at least 9-fold faster for k-mer screens while generating closely matching SSP values used for downstream Bayesian activity inference.
The approximation is further extended to a 1st-order di-nucleotide background model, improving correction for compositionally biased sequences leading to better separation of AT- and GC-rich motif expectations for longer k-mers in AT-biased mRNA sequences.
Applied to liver cancer samples, the binomial background approximation preserve downstream microRNA activity estimates, making it a better option for large-scale applications such as single-cell regulatory profiling.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 239
Loading