Utilizing parental data in trio binning diploid de novo genome assembly

Published: 26 Mar 2023, Last Modified: 14 Feb 2025OpenReview Archive Direct UploadEveryoneCC BY 4.0
Abstract: Haplotype-resolved human genome assembly is important in providing a complete understanding of the genome and its diverse genetic variations. Our research presents a new approach for haplotype-resolved human genome assembly using trio binning with additional augmentation techniques for parental data. As the child contains genetic information from both parents, in the same way, the parents contain genetic information from the child's grandparents. Some of the genetic information from grandparents is passed on to the child, while some is not. We hypothesize that using only the parental data containing genetic information passed to the child could aid the trio binning process and result in a higher-quality child diploid genome. The idea is to perform augmentation on the child's data using parental data that contain the same genetic information the child has. We map parental and child reads to T2T-CHM13 v2.0 reference, and detect differences from the reference separately for each person. By extracting only those parental reads that are consistent with child genetic information and combining them with child data, our method aims to improve genome assembly quality and completeness. For genome assembly, we use hifiasm v0.18.5. with child PacBio-HiFi data and integrated ultra-long ONT reads. For trio binning we use parental PacBio-HiFi data. For evaluating, we compare our assembly to the T2T-CHM13 v2.0 reference. The initial results show that adding parental reads to the child's initial set of data for the assembly and performing trio binning with all parental data do not improve the quality of the assembly; however, further research is required.
Loading