On the Computation of Longest Previous Non-overlapping Factors

Enno Ohlebusch, Pascal Weber

Published: 01 Jan 2019, Last Modified: 03 Aug 2024SPIRE 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The f-factorization of a string is similar to the well-known Lempel-Ziv (LZ) factorization, but differs from it in that the factors must be non-overlapping. There are two linear time algorithms that compute the f-factorization. Both of them compute the array of longest previous non-overlapping factors (\(\mathsf {LPnF}\)-array), from which the f-factorization can easily be derived. In this paper, we present a simple algorithm that computes the \(\mathsf {LPnF}\)-array from the \(\mathsf {LPF}\)-array and an array \(\mathsf {prevOcc}\) that stores positions of previous occurrences of LZ-factors. The algorithm has a linear worst-case time complexity if \(\mathsf {prevOcc}\) contains leftmost positions. Moreover, we provide an algorithm that computes the f-factorization directly. Experiments show that our first method (combined with efficient \(\mathsf {LPF}\)-algorithms) is the fastest and our second method is the most space efficient way to compute the f-factorization.