Efficient Sparse Flow Decomposition Methods for RNA Multi-Assembly

Published: 01 Jan 2025, Last Modified: 12 May 2025CoRR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Decomposing a flow on a Directed Acyclic Graph (DAG) into a weighted sum of a small number of paths is an essential task in operations research and bioinformatics. This problem, referred to as Sparse Flow Decomposition (SFD), has gained significant interest, in particular for its application in RNA transcript multi-assembly, the identification of the multiple transcripts corresponding to a given gene and their relative abundance. Several recent approaches cast SFD variants as integer optimization problems, motivated by the NP-hardness of the formulations they consider. We propose an alternative formulation of SFD as a fitting problem on the conic hull of the flow polytope. By reformulating the problem on the flow polytope for compactness and solving it using specific variants of the Frank-Wolfe algorithm, we obtain a method converging rapidly to the minimizer of the chosen loss function while producing a parsimonious decomposition. Our approach subsumes previous formulations of SFD with exact and inexact flows and can model different priors on the error distributions. Computational experiments show that our method outperforms recent integer optimization approaches in runtime, but is also highly competitive in terms of reconstruction of the underlying transcripts, despite not explicitly minimizing the solution cardinality.
Loading