mcPBWT: Space-Efficient Multi-column PBWT Scanning Algorithm for Composite Haplotype Matching

Published: 01 Jan 2021, Last Modified: 12 Nov 2024ICCABS 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Positional Burrows-Wheeler Transform (PBWT) is a data structure that supports efficient algorithms for finding matching segments in a panel of haplotypes. It is of interest to study the composite patterns of multiple matching segments or blocks arranged contiguously along a same haplotype as they can indicate recombination crossover events, gene-conversion tracts, or, sometimes, errors of phasing algorithms. However, current PBWT algorithms do not support search of such composite patterns efficiently. Here, we present our algorithm, mcPBWT (multi-column PBWT), that uses multiple synchronized runs of PBWT at different variant sites providing a “look-ahead" information of matches at those variant sites. Such “look-ahead” information allows us to analyze multiple contiguous matching pairs in a single pass. We present two specific cases of mcPBWT, namely double-PBWT and triple-PBWT which utilize two and three columns of PBWT respectively. double-PBWT finds two matching pairs’ combinations representative of crossover event or phasing error while triple-PBWT finds three matching pairs’ combinations representative of gene-conversion tract.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview