Abstract: In this paper, we design succinct index structures for a text string T of n binary symbols to support efficient searching of a pattern P of length m. Motivated by the fact that the standard representation of suffix arrays uses n lg n bits which is more than the theoretical minimum, we present a theorem that characterizes a permutation as the suffix array of a binary string. Based on the theorem, we design a succinct representation of suffix arrays of binary strings that uses n + o(n) bits, which is the theoretical minimum plus a lower order term, and answers existential and cardinality queries in O(m) time without storing the raw text. With 2n+o(n) bits, we can list pattern occurrences in O(m + occ lg n) time in the general case, and for long patterns, when m = Ω(lg1+∈ n), we answer such listing queries in O(m + occ) time. We also present another implementation that uses O(n) bits and supports pattern searching in O(m + occ lgλ n) time for any fixed λ such that 0 < λ < 1. More results and trade-offs are reported in the paper.
Loading