Low-Redundancy Codes for Correcting Multiple Short-Duplication and Edit ErrorsDownload PDFOpen Website

Published: 01 Jan 2023, Last Modified: 14 May 2023IEEE Trans. Inf. Theory 2023Readers: Everyone
Abstract: Due to its higher data density, longevity, energy efficiency, and ease of generating copies, DNA is considered a promising technology for satisfying future storage needs. However, a diverse set of errors including deletions, insertions, duplications, and substitutions may arise in DNA at different stages of data storage and retrieval. The current paper constructs error-correcting codes for simultaneously correcting short (tandem) duplications and at most <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$p$ </tex-math></inline-formula> edits, where a short duplication generates a copy of a substring with length <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\leq 3$ </tex-math></inline-formula> and inserts the copy following the original substring, and an edit is a substitution, deletion, or insertion. Compared to the state-of-the-art codes for duplications only, the proposed codes correct up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$p$ </tex-math></inline-formula> edits (in addition to duplications) at the additional cost of roughly <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$8p(\log _{q} n) (1+o(1))$ </tex-math></inline-formula> symbols of redundancy, thus achieving the same asymptotic rate, where <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$q\ge 4$ </tex-math></inline-formula> is the alphabet size and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$p$ </tex-math></inline-formula> is a constant. Furthermore, the time complexities of both the encoding and decoding processes are polynomial when <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$p$ </tex-math></inline-formula> is a constant with respect to the code length.
0 Replies

Loading