OCER and OCWER: Integrating Visual Similarity and Segmentation in OCR Evaluation

Published: 14 Dec 2025, Last Modified: 11 Jan 2026LM4UC@AAAI2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: OCR evaluation, edit distance, low-resource scripts, visual similarity, segmentation errors
Abstract: Character Error Rate (CER) and Word Error Rate (WER) are the standard metrics for evaluating OCR, but their binary sub- stitution cost ignores visual similarity between characters and over-penalizes segmentation errors. We introduce the Optical Character Error Rate (OCER), which weights substitutions by visual similarity, and the Op- tical Character Word Error Rate (OCWER), which ex- tends this principle to the word level and adds explicit split/u- nion operations. These metrics provide evaluations that better reflect human perception and common OCR-specific errors.
Submission Number: 40
Loading