Abstract: Positional Burrows-Wheeler Transform (PBWT) is a data structure that supports efficient algorithms for finding matching segments in a panel of haplotypes. It is of interest to study the composite patterns of multiple matching segments or blocks arranged contiguously along a same haplotype as they can indicate recombination crossover events, gene-conversion tracts, or, sometimes, errors of phasing algorithms. However, current PBWT algorithms do not support search of such composite patterns efficiently. Here, we present our algorithm, mcPBWT (multi-column PBWT), that uses multiple synchronized runs of PBWT at different variant sites providing a “look-ahead" information of matches at those variant sites. Such “look-ahead” information allows us to analyze multiple contiguous matching pairs in a single pass. We present two specific cases of mcPBWT, namely double-PBWT and triple-PBWT which utilize two and three columns of PBWT respectively. double-PBWT finds two matching pairs’ combinations representative of crossover event or phasing error while triple-PBWT finds three matching pairs’ combinations representative of gene-conversion tract.
Loading