Recognition and Link Prediction of Onomatopoeia Texts with Arbitrary Shapes

Published: 01 Jan 2024, Last Modified: 17 Apr 2025ICDAR (3) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The onomatopoeia texts in the Japanese comic, with its arbitrary shapes diverse backgrounds and complex layouts, are a challenging and worthwhile subject of study. On the one hand, when recognizing onomatopoeia text images, using existing mainstream text recognition methods may lead to the inability to achieve the expected recognition results. This may be caused by these methods not taking into account the unique characteristics of onomatopoeia words. On the other hand, truncated text which is a part of a complete onomatopoeia word text but not adjacent to other parts on a page of the comic has no meaning. It is only when these truncated texts of a complete onomatopoeia word are linked together that their original meaning can be understood. So, a new method named M4C-COO was proposed to predict the link by researchers but the issue of class imbalance between truncated texts and non-truncated texts was ignored. To solve these problems, in this paper, a new recognition method exploiting the characteristics of onomatopoeia texts was devised; focal loss (FL) was introduced to predict the link and, furthermore, a completely novel loss function based on the focal loss (FB) was proposed. Finally, through experiments, the effectiveness of the works was demonstrated, achieving the state-of-the-art performance.
Loading