Abstract: The exponential reduction in cost of genome sequencing has resulted in a rapid growth of genomic data. Most of the entropy of short read data lies not in the sequence of read bases themselves but in their Quality Scores—the confidence measurement that each base has been sequenced correctly. Lossless compression methods are now close to their theoretical limits and hence there is a need for lossy methods that further reduce the complexity of these data without impacting downstream analyses.
0 Replies
Loading