CLADES–Contrastive Learning Augmented DifferEntial Splicing with Orthologous Positive Pairs
Abstract: Abstract. Alternative splicing (AS) reshapes transcript and protein repertoires across biological, e.g.
cellular, contexts. However, learning sequence→ content-specific splicing mappings is challenging due
to limited labels across tissues and cell types and variability introduced by experimental protocols.
We propose a contrastive representation learning pre-training approach grounded in evolutionary con-
servation. Orthologous exon–intron junction sequences are treated as semantically consistent views
of the same regulatory program: evolutionary orthologs are positive pairs, non-homologous junctions
are negatives. This discriminative objective aligns embeddings of regulatory equivalents while separat-
ing functionally unrelated sequences, inducing invariances to unconstrained sequence and emphasizing
conserved motif/RBP and positional signals. We show that this pre-training strategy provides repre-
sentations that help predict ∆ψ, the change in exon inclusion between conditions, which encodes both
direction and magnitude of splicing shifts. Specifically, we finetune a lightweight supervised head on
available labels to predict ∆ψ. To make these predictions biologically meaningful, we further introduce
an interpretable, splice-motif–aware classification framework grounded in known regulatory signals. On
benchmarks spanning tissue- and cell-type differential splicing, the learned representations yield strong
∆ψ classification performance (AUPRC/AUROC for increased/decreased inclusion) and competitive
results for regression (RMSE, Spearman). These findings indicate that evolution-as-augmentation, in-
stantiated via contrastive learning, is an effective and biologically principled route to context-resolved
splicing prediction.
Loading