Abstract: Reconstructing haplotypes through sequencing of a mixture of similar sequences is a fundamental problem. Long-read sequencing technologies can connect distant alleles to disentangle similar haplotypes, but handling elevated sequencing error rates requires specialized techniques. We present devider, an algorithm for haplotyping small sequences—such as viruses or genes—from long-read sequencing. devider uses a positional de Bruijn graph with sequence-to-graph alignment on an alphabet of informative alleles to provide a fast assembly-inspired approach compatible with various long-read sequencing technologies. Benchmarking on synthetic mixtures of antimicrobial resistance (AMR) genes showed that devider recovered 83% of haplotypes, 23% points higher than the next best method. On real PacBio and Nanopore datasets, devider recapitulates previously known results in seconds, disentangling a bacterial community with \(> 10\) strains and an HIV-1 co-infection dataset. We used devider to investigate the within-host diversity of a long-read bovine gut metagenome enriched for AMR genes, discovering a history of recombination for diverse AMR gene haplotypes and showcasing devider ’s ability to unveil ecological signals for heterogeneous mixtures.
Loading