Threshold Determination for Chinese Character Image Processing in Multimodal Information Fusion

Li Weigang, Rafael Marconi Ramos, Pedro Carvalho Brom

Published: 01 Aug 2024, Last Modified: 30 Sept 2024The 28th International Conference on Asian Language Processing, 4-6 August 2024, Hohhot, ChinaEveryoneCC BY 4.0

Abstract: Multimodal information fusion is gaining traction in Chinese Natural Language Processing (CNLP), particularly for phono-semantic compound comprehension and character identification. Existing research often overlooks the impact of varying pixel sizes, scales, and stroke counts on character image processing, leading to potential noise. This paper addresses this gap by analyzing our prepared dataset of Chinese characters with varying stroke counts (1-64) at different pixel resolutions (12, 16, 24, 35, 60, 96) and including up to 100 characters per stroke count. We identify a processing threshold for character images based on stroke count and resolution, a first in the field. Using Euclidean near-graphic similarity and ResNet50 image embedding similarity analyses, we establish thresholds such as 12 strokes for 16-pixel images and 26 strokes for 24- pixel images. These findings offer valuable insights for enhancing the robustness of multimodal information fusion for Chinese character recognition in NLP.