Abstract: Homologous chromosome pairs have similar DNA sequences and encode the same type of information but are not identical. The most common form of variations between them are the so-called single nucleotide polymorphisms (SNPs), where two chromosomes in a homologous pair differ at individual corresponding positions. The problem of haplotype assembly is concerned with finding an ordered sequence of SNPs associated with each one of the chromosomes in a pair. The information about genetic variations (i.e., SNPs) is inferred using short reads provided by the high-throughput DNA sequencing systems. The reads are potentially erroneous and haplotype assembly can thus be interpreted as the process of separating fragments into two potentially inconsistent (due to noise) classes, each corresponding to one of the haplotypes in a pair. Solving this problem is known to be NP-hard, and hence practical haplotype assembly typically involves heuristics and low complexity approximations. In our work, we propose novel graphical models of the hap-lotype assembly problem and design efficient message passing algorithms for solving it. The developed algorithms perform close to the existing state-of-the-art techniques while being more computationally efficient.
0 Replies
Loading