Abstract: Although a number of normalized edit distances presented so far may
offer good performance in some applications, none of them can be regarded as a
genuine metric between strings because they do not satisfy the triangle inequality.
Given two strings X and Y over a finite alphabet, this paper defines a new
normalized edit distance between X and Y as a simple function of their lengths (|X|
and |Y|) and the Generalized Levenshtein Distance (GLD) between them. The new
distance can be easily computed through GLD with a complexity of O(|X||Y|) and
it is a metric valued in [0, 1] under the condition that the weight function is a metric
over the set of elementary edit operations with all costs of insertions/deletions
having the same weight. Experiments using the AESA algorithm in handwritten digit
recognition show that the new distance can generally provide similar results to
some other normalized edit distances and may perform slightly better if the triangle
inequality is violated in a particular data set.
Loading