Optimum Deep Learning Method for Document Layout Analysis in Low Resource Languages

Md. Mutasim Billah Abu Noman Akanda, Maruf Ahmed, AKM Shahariar Azad Rabby, Fuad Rahman

Published: 2024, Last Modified: 06 Jan 2025ACM Southeast Regional Conference 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Document Layout Analysis (DLA) has become a crucial process in digitizing documents. Today, has become increasingly important to properly understand a digital document to get insights on the structure and contents of the document. DLA combines different techniques of image processing, computer vision, and natural language processing to help us perform various tasks such as character recognition, document classification, information retrieval, content summarization, document restructuring, etc. Gathering proper insights into the layout of a document is important to detect the identity of each element and its relationship. There have been many major Deep Learning based DLA algorithms invented recently which obtained impressive results in publicly available high-resource languages like English. However, there has been a significant shortage of available information on the effectiveness of Deep Learning based DLA approaches for low-resource languages. This paper investigates these state-of-the-art deep learning-based DLA approaches - DiT, LayoutLMv3, and YOLOv8 [9] to find the optimal approach for low-resource and grapheme-based languages like Bengali. We found out that YOLOv8 [9] performs the best with 8.95% better IoU score than DiT and 38.48% better IoU score than LayoutLMv3 for DLA task in low resource and grapheme-based language.