DNA Merge-Sort: A Family of Nested Varshamov-Tenengolts Reassembly Codes for Out-of-Order Media

Published: 01 Jan 2024, Last Modified: 15 May 2025IEEE Trans. Commun. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Motivated by the DNA storage paradigm, we consider the torn-paper channel (TPC), which models data storage in long DNA molecules and breaks the input sequence into a random number of out-of-order variable-length non-overlapped fragments. We propose a computationally-efficient code construction for this model. More specifically, we introduce a family of nested Varshamov-Tenengolts (VT) codes to merge and sort the fragments in order to recover the stored data. We numerically show that our scheme (i) obtains rates that are higher than in prior results, (ii) has a decoding complexity that is cubic in the number of codeword fragments, which is significantly lower than the complexity of the brute-force approach, and (iii) offers decreasing and negligible error rates as the codeword length increases. We also propose a new construction for VT codes, quantify the number of required parity bits, and show that our approach requires fewer parity bits compared to known results.
Loading