Abstract: The Branching Gaussian Process (BGP) model is a modification of the Overlapping Mixture of Gaussian Processes (OMGP) where latent functions branch in time. The BGP model was introduced as a method to model bifurcations in single-cell gene expression data and order genes by inferring their branching time parameter. A limitation of the current BGP model is that the assignment of observations to latent functions is inferred independently for each output dimension (gene). This leads to inconsistent assignments across outputs and reduces the accuracy of branching time inference. Here, we propose a multivariate branching Gaussian process (MBGP) model to perform joint branch assignment inference across multiple output dimensions. This ensures that branch assignments are consistent and leverages more data for branching time inference. Model inference is more challenging than for the original BGP or OMGP models because assignment labels can switch from trunk to branch lineages as branching times change during inference. To scale up inference to large datasets we use sparse variational Bayesian inference. We examine the effectiveness of our approach on synthetic data and a single-cell RNA-Seq dataset from mouse haematopoietic stem cells (HSCs). Our approach ensures assignment consistency by design and achieves improved accuracy in branching time inference and assignment accuracy.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: As suggested by the Action Editor, we have: * Improved the problem definition section. There is now a simplified diagram that walks the reader through the problem. It is initially defined without reference to its biological motivation for a self-contained treatment. We then provide the biological motivation. * As a follow-on to the above, we have made the presentation better suited to an ML audience by removing most references to the motivational biological problem. Instead we now refer back to the problem definition section, which provides a full problem statement without biological concepts. * We have improved the figures to make them completely self-contained. Interpretation is still done in the text, but the captions and labels are substantially improved. * We have addressed all other editorial comments including equation layout, table styling, typos and abbreviations. * We have also added a more up-to-date review of literature, in particular, mentioning the scFates package.
Assigned Action Editor: ~Patrick_Flaherty1
Submission Number: 589