Range Shortest Unique Substring Queries

Paniz Abedin; Arnab Ganguly; Solon P. Pissis; Sharma V. Thankachan

Range Shortest Unique Substring Queries

Paniz Abedin, Arnab Ganguly, Solon P. Pissis, Sharma V. Thankachan

Published: 01 Jan 2019, Last Modified: 10 Feb 2025SPIRE 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Let \(\mathsf {T}[1,n]\) be a string of length n and \(\mathsf {T}[i,j]\) be the substring of \(\mathsf {T}\) starting at position i and ending at position j. A substring \(\mathsf {T}[i,j]\) of \(\mathsf {T}\) is a repeat if it occurs more than once in \(\mathsf {T}\); otherwise, it is a unique substring of \(\mathsf {T}\). Repeats and unique substrings are of great interest in computational biology and in information retrieval. Given string \(\mathsf {T}\) as input, the Shortest Unique Substring problem is to find a shortest substring of \(\mathsf {T}\) that does not occur elsewhere in \(\mathsf {T}\). In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over \(\mathsf {T}\) answering the following type of online queries efficiently. Given a range \([\alpha , \beta ]\), return a shortest substring \(\mathsf {T}[i,j]\) of \(\mathsf {T}\) with exactly one occurrence in \([\alpha , \beta ]\). We present an \(\mathcal {O}(n\log n)\)-word data structure with \(\mathcal {O}(\log _w n)\) query time, where \(w=\varOmega (\log n)\) is the word size. Our construction is based on a non-trivial reduction allowing us to apply a recently introduced optimal geometric data structure [Chan et al. ICALP 2018].

Loading