Abstract: The Branching Gaussian Process (BGP) model is a modification of the Overlapping Mixture
of Gaussian Processes (OMGP) where latent functions branch in time. The BGP model
was introduced as a method to model bifurcations in single-cell gene expression data and
order genes by inferring their branching time parameter. A limitation of the current BGP
model is that the assignment of observations to latent functions is inferred independently
for each output dimension (gene). This leads to inconsistent assignments across outputs
and reduces the accuracy of branching time inference. Here, we propose a multivariate
branching Gaussian process (MBGP) model to perform joint branch assignment inference
across multiple output dimensions. This ensures that branch assignments are consistent and
leverages more data for branching time inference. Model inference is more challenging than
for the original BGP or OMGP models because assignment labels can switch from trunk to
branch lineages as branching times change during inference. To scale up inference to large
datasets we use sparse variational Bayesian inference. We examine the effectiveness of our
approach on synthetic data and a single-cell RNA-Seq dataset from mouse haematopoietic
stem cells (HSCs). Our approach ensures assignment consistency by design and achieves
improved accuracy in branching time inference and assignment accuracy.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: As suggested by the Action Editor, we have:
* Improved the problem definition section. There is now a simplified diagram that walks the reader through the problem. It is initially defined without reference to its biological motivation for a self-contained treatment. We then provide the biological motivation.
* As a follow-on to the above, we have made the presentation better suited to an ML audience by removing most references to the motivational biological problem. Instead we now refer back to the problem definition section, which provides a full problem statement without biological concepts.
* We have improved the figures to make them completely self-contained. Interpretation is still done in the text, but the captions and labels are substantially improved.
* We have addressed all other editorial comments including equation layout, table styling, typos and abbreviations.
* We have also added a more up-to-date review of literature, in particular, mentioning the scFates package.
Code: https://github.com/ManchesterBioinference/BranchedGP
Assigned Action Editor: ~Patrick_Flaherty1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 589
Loading