CLADES–Contrastive Learning Augmented DifferEntial Splicing with Orthologous Positive Pairs

Published: 26 Feb 2026, Last Modified: 07 Mar 2026OpenReview Archive Direct UploadEveryonearXiv.org perpetual, non-exclusive license
Abstract: Abstract. Alternative splicing (AS) reshapes transcript and protein repertoires across biological, e.g. cellular, contexts. However, learning sequence→ content-specific splicing mappings is challenging due to limited labels across tissues and cell types and variability introduced by experimental protocols. We propose a contrastive representation learning pre-training approach grounded in evolutionary con- servation. Orthologous exon–intron junction sequences are treated as semantically consistent views of the same regulatory program: evolutionary orthologs are positive pairs, non-homologous junctions are negatives. This discriminative objective aligns embeddings of regulatory equivalents while separat- ing functionally unrelated sequences, inducing invariances to unconstrained sequence and emphasizing conserved motif/RBP and positional signals. We show that this pre-training strategy provides repre- sentations that help predict ∆ψ, the change in exon inclusion between conditions, which encodes both direction and magnitude of splicing shifts. Specifically, we finetune a lightweight supervised head on available labels to predict ∆ψ. To make these predictions biologically meaningful, we further introduce an interpretable, splice-motif–aware classification framework grounded in known regulatory signals. On benchmarks spanning tissue- and cell-type differential splicing, the learned representations yield strong ∆ψ classification performance (AUPRC/AUROC for increased/decreased inclusion) and competitive results for regression (RMSE, Spearman). These findings indicate that evolution-as-augmentation, in- stantiated via contrastive learning, is an effective and biologically principled route to context-resolved splicing prediction.
Loading