Non-overlapping Indexing in BWT-Runs Bounded Space

Published: 01 Jan 2023, Last Modified: 05 Feb 2025SPIRE 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We revisit the non-overlapping indexing problem for an efficient repetition-aware solution. The problem is to index a text T[1..n], such that whenever a pattern P[1..p] comes as a query, we can report the largest set of non-overlapping occurrences of P in T. A previous index by Cohen and Porat [ISAAC 2009] takes linear space and optimal \(O(p+\mathsf {occ_{no}})\) query time, where \(\mathsf {occ_{no}}\) denotes the output size. We present an index of size O(r), where r denotes the number of runs in the Burrows Wheeler Transform (BWT) of T. The parameter r is significantly smaller than n for highly repetitive texts. The query time of our index is \(O(p\log \log _w \sigma +\textsf{sort}(\mathsf {occ_{no}}))\), where \(\sigma \) denotes the alphabet size, w denotes the machine word size in bits and \(\textsf{sort}(x)\) denotes the time for sorting x integers within the range [1, n].
Loading