Keywords: OCR evaluation, edit distance, low-resource scripts, visual similarity, segmentation errors
Abstract: Character Error Rate (CER) and Word Error Rate (WER) are
the standard metrics for evaluating OCR, but their binary sub-
stitution cost ignores visual similarity between characters and
over-penalizes segmentation errors.
We introduce the Optical Character Error Rate (OCER),
which weights substitutions by visual similarity, and the Op-
tical Character Word Error Rate (OCWER), which ex-
tends this principle to the word level and adds explicit split/u-
nion operations. These metrics provide evaluations that better
reflect human perception and common OCR-specific errors.
Submission Number: 40
Loading