Efficient Data Structures for Range Shortest Unique Substring Queries

Paniz Abedin; Arnab Ganguly; Solon P. Pissis; Sharma V. Thankachan

Efficient Data Structures for Range Shortest Unique Substring Queries

Paniz Abedin, Arnab Ganguly, Solon P. Pissis, Sharma V. Thankachan

Published: 01 Jan 2020, Last Modified: 10 Feb 2025Algorithms 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Let T [ 1 , n ] be a string of length n and T [ i , j ] be the substring of T starting at position i and ending at position j. A substring T [ i , j ] of T is a repeat if it occurs more than once in T ; otherwise, it is a unique substring of T . Repeats and unique substrings are of great interest in computational biology and information retrieval. Given string T as input, the Shortest Unique Substring problem is to find a shortest substring of T that does not occur elsewhere in T . In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over T answering the following type of online queries efficiently. Given a range [ α , β ] , return a shortest substring T [ i , j ] of T with exactly one occurrence in [ α , β ] . We present an O ( n log n ) -word data structure with O ( log w n ) query time, where w = Ω ( log n ) is the word size. Our construction is based on a non-trivial reduction allowing for us to apply a recently introduced optimal geometric data structure [Chan et al., ICALP 2018]. Additionally, we present an O ( n ) -word data structure with O ( n log ϵ n ) query time, where ϵ > 0 is an arbitrarily small constant. The latter data structure relies heavily on another geometric data structure [Nekrich and Navarro, SWAT 2012].

Loading