Abstract: In the gapped string indexing problem, one is given a text \(T[1\mathinner {.\,.}n]\) to preprocess. At query time, a gapped pattern \(P = P_1[\alpha \mathinner {.\,.}\beta ] P_2\) and an integer range \([\alpha \mathinner {.\,.}\beta ]\) are provided, where \(P_1\) and \(P_2\) are strings of total length m. The goal of the query is to report all pairs of occurrences of \(P_1\) and \(P_2\) with a gap falling within \([\alpha \mathinner {.\,.}\beta ]\). An existing (conditional) lower bound reveals that any index with query time \(\widetilde{\mathcal {O}}(m + occ)\) must occupy almost quadratic space, where occ is the output size. However, there are interesting special cases where more efficient solutions are possible. For example, queries with a bounded gap, i.e., \(\beta \le G\) (fixed at construction) can be answered optimally using an \(\widetilde{\mathcal {O}}(nG)\) space structure. In this paper, we bring out an interesting version of the problem where rather than having a fixed upper bound on \(\beta \), we fix \(\gamma \) and allow any \(\beta \le \gamma \cdot m\) (i.e., allow longer gaps for longer patterns; gap-to-pattern ratio is bounded). We show that such queries can be answered optimally using an \(\widetilde{\mathcal {O}}(n\gamma )\) space structure.
Loading