Abstract: Multimodal information fusion is gaining traction
in Chinese Natural Language Processing (CNLP), particularly
for phono-semantic compound comprehension and character
identification. Existing research often overlooks the impact of
varying pixel sizes, scales, and stroke counts on character image
processing, leading to potential noise. This paper addresses this
gap by analyzing our prepared dataset of Chinese characters
with varying stroke counts (1-64) at different pixel resolutions
(12, 16, 24, 35, 60, 96) and including up to 100 characters per
stroke count. We identify a processing threshold for character
images based on stroke count and resolution, a first in the
field. Using Euclidean near-graphic similarity and ResNet50
image embedding similarity analyses, we establish thresholds
such as 12 strokes for 16-pixel images and 26 strokes for 24-
pixel images. These findings offer valuable insights for enhancing
the robustness of multimodal information fusion for Chinese
character recognition in NLP.
Loading