Abstract: Genomic sequence data obtained through high-throughput sequencing (HTS) technologies are commonly stored either as raw sequencing reads in FASTQ format or as reads mapped to a reference genome in SAM format. Both of these formats have large memory footprints. Worldwide increase of HTS data has prompted the development of specialized compression methods that aim to significantly reduce HTS data size. Below is a comparative overview of available lossless genomic data compression approaches, including their advantages and pitfalls.
Loading